You are on page 1of 24

15410072, 2020, S1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/psj.12387 by Universitat Politecnica De Valencia, Wiley Online Library on [27/10/2022].

See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Policy Studies Journal, Vol. 48, No. S1, 2020

The Growth of the Evaluation Tree in the Policy Analysis


Forest: Recent Developments in Evaluation
Sebastian Lemire, Laura R. Peck, and Allan Porowski

The practice and profession of evaluation is continually evolving. From its early origin in the Great
Society years of the 1960s, through its golden years of the 1970s, its transformation under the fiscal
conservatism of the Reagan era in the 1980s, and in its maturation during the performance and results
era of the 1990s, the field of evaluation continues to evolve in response to broader trends in society. This
article examines recent developments and trends in the practice and profession of evaluation. Structured
around the evaluation theory tree, the presentation of these developments elaborates on the three main
branches of evaluation: methods, use, and valuing. The concluding discussion briefly addresses the
central role of evaluation—and other types of knowledge production—in providing actionable evidence
for use in public policy and program decision making.
KEY WORDS: evaluation, policy analysis, public policy, history of evaluation, evaluation theory tree

Introduction

The practice and profession of evaluation is continually evolving. In a recent


commentary on the field of evaluation, Michael Scriven (2019) observed: “During
evaluation’s short history, as in the early days of exploring a new continent or archi-
pelago, we have been excited by discovering exotic new fauna or flora, beautiful
or radically novel in structure and behavior” (p. 54). Seizing on the metaphor of
exploration and discovery, this article explores the “exotic new fauna and flora”
of the evolving landscape of evaluation. What are the past and current contours of
the evaluation landscape? How, if at all, is the landscape changing? Are there new
emerging contours to explore? These are some of the questions we consider in what
follows.
Every field trip should have a theme. Ours is the topic of how evaluation, as
both a practice and a profession, is embedded within a broader knowledge produc-
tion for policy and program decision making, the so-called policy enterprise. This
theme makes the field trip particularly relevant for Policy Studies Journal, because
it speaks directly to the overlap between evaluation and policy analysis. As most

S47
doi: 10.1111/psj.12387
© 2020 The Authors. Policy Studies Journal published by Wiley Periodicals, Inc. on behalf of Policy Studies Organization
This is an open access article under the terms of the Creative Commons Attribution-NonCommercial-NoDerivs License, which permits
use and distribution in any medium, provided the original work is properly cited, the use is non-commercial and no modifications or
adaptations are made.
15410072, 2020, S1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/psj.12387 by Universitat Politecnica De Valencia, Wiley Online Library on [27/10/2022]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
S48 Policy Studies Journal, 48:S1

readers know, policy analysis falls within the political science subfield of policy
studies, which addresses both the policy process and analysis activities within that
process. As Weimer and Vining (2017) define it, “policy analysis is client-oriented
advice relevant to public decisions informed by social values” (p. 30). Broader than
evaluation, policy analysis includes almost any analysis that aims to inform pol-
icy decisions. Evaluation is a particular type of policy analysis that uses systematic
data collection and analysis to determine the worth—often the formative or sum-
mative effectiveness—of a program or policy. Formative evaluation considers pro-
gram processes, and summative evaluation examines a program or policy’s ultimate
outcomes and impacts. The overlapping but distinct fields of evaluation and policy
analysis make the exercise of mapping the evolution and current state of evaluation
topical and relevant to program evaluators and policy analysts alike.
The article proceeds in three parts. The first part briefly describes the histor-
ical development of evaluation in relation to broader social, economic, political,
and technological developments. Informed by these observations, the second part
focuses on what we see as the most salient recent developments in the practice and
profession of evaluation. With the aim of providing a coherent presentation of these
diverse trends, we use the “evaluation theory tree” (Christie & Alkin, 2013) as an
organizing framework, categorizing the identified trends according to the tree’s
three main branches: methods, use, and valuing. Part three discusses implications of
our observed trends for policy analysis.
Before proceeding to this discussion, we note that the methodological foun-
dation of the article is primarily anecdotal, reflecting the combined experience,
scholarship, and curiosity of the authors as well as input from experienced col-
leagues. Collectively, as applied evaluation practitioners and scholars, we have over
60  years as active and committed members of the policy analysis and evaluation
communities. We have designed and implemented evaluations across varied policy
areas—including employment, welfare, education, housing, food assistance, mar-
ket development, and public health—and varied geographic regions, including the
United States, Afghanistan, Denmark, Ghana, and Nigeria. Over the course of our
many years of practice, we have held positions in academia, consulting manage-
ment firms, government research units, non-profit agencies, and for-profit research
organizations. Through a range of evaluation projects, conference presentations,
journal publications, guest editorials of special issues, hallway conversations, and
book authorship (e.g., Peck, 2020; Springer, Haas, & Porowski, 2017), we have also
had the opportunity to engage with a broad range of evaluation practitioners and
commissioners. The topics and trends we highlight in the following pages reflect
these engagements and experiences within the evaluation and policy analysis fields.

The Early Seeds and Growth of an Evaluation Tree

Evaluation can be—and has been—defined in many varied ways. Michael


Quinn Patton (1997) defines evaluation as “the systematic collection of information
about the activities, characteristics, and outcomes of programs to make judgments
15410072, 2020, S1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/psj.12387 by Universitat Politecnica De Valencia, Wiley Online Library on [27/10/2022]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Lemire et al.: The Growth of the Evaluation Tree in the Policy Analysis ForestS49

about the program, improve program effectiveness, and/or inform decisions about
future programming” (p. 23). This definition speaks directly to three defining char-
acteristics of evaluation: the use of systematic data collection (methods), the central
role of making judgments about programs and policies (valuing), and the central
purpose of informing decision  making (use). We explore and expand upon these
three aspects of evaluation later in this article. As we show, methods, valuing, and
use have been fundamentally influenced and shaped by evaluation’s embedded role
in knowledge production for policy and program decision making.
Throughout its history, evaluation has been intrinsically connected to public
policy, policymaking, and policy analysis. The public policy–evaluation connec-
tion originated in the 1950s, with the birth of modern evaluation. Within the federal
government, increased attention to operations research, primarily carried out by
planning and systems analysts, economists, budget specialists, and internal audi-
tors, planted the seeds for modern-day evaluation (Weiss, 1998). During this time
period—and fueled (at least in part) by the race against the rapidly developing space
program of the Soviet Union—evaluation primarily centered on the assessment of
educational outcomes of American youth. With the goal of boosting scientific lit-
eracy, the federal government sponsored a broad range of curriculum efforts (e.g.,
the Harvard Project Physics), and funding for evaluation of these initiatives soon
followed (Weiss, 1998). The seeds of evaluation are deeply rooted in operations
research and assessment supporting the federal government.
These early seeds found a nurturing environment in the Great Society years
(roughly from the early 1960s to the mid-1970s), a period characterized by unprec-
edented growth in the size and reach of federal programs. Under the banner of the
War on Poverty, federal investments in public aid programs in education, housing,
and workforce development, for example, rose from $23.5 billion in 1950 to $428.3
billion in 1979, an increase of 600 percent in constant dollars (Bell, 1983, cited in
Shadish, Cook, & Leviton, 1991). As Shadish et al. (1991) observe, “most programs
were launched with high hopes, great dispatch, and enormous financial invest-
ments” (p. 22). In tandem with these federal investments, the demand for evidence
of the effectiveness of social programs and policies quickly arose from the public
and from legislators—social accountability became pressing. During this time, the
federal government’s demand for evaluation services continued to rise. Evaluation
was becoming a growth industry (Shadish et al., 1991).
The demand for evaluation generally concerned large-scale, multi-year demon-
stration projects, focusing primarily on value for investment: the extent to which
federally funded programs were making a difference, and at what cost. As described
in Campbell’s (1969) seminal work on reforms as experiments, three mechanisms com-
prised the underlying logic of using evaluation for social programing: (1) generate
program variations, (2) select the variant that reduces the problem the most (multi-
arm outcome evaluation), and (3) retain knowledge about effective programs. In line
with this thinking, and informed by Campbell and Stanley’s (1963) work on exper-
imental designs, the research design of choice was the experimental design and to
some extent quasi-experimental designs, both kinds of evaluation ideal for answer-
ing questions about program impacts. The evaluations of the Juvenile Delinquency
15410072, 2020, S1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/psj.12387 by Universitat Politecnica De Valencia, Wiley Online Library on [27/10/2022]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
S50 Policy Studies Journal, 48:S1

Program (1962), the Manpower Development and Training Act of 1962, the anti-pov-
erty programs under the Economic Opportunity Act of 1964, and the Title I educa-
tional evaluation requirement under the Elementary and Secondary Education Act
(1965) are a few of the often-cited landmark evaluations of this era. Evaluation was
largely driven by legislative demand.
In response to this demand, the internal evaluation capacity of the federal
government was found lacking. While internal evaluation staff and even evalua-
tion units were beginning to emerge, the scope and size of the evaluations needed
reached beyond the evaluation capacity within the federal government. Moreover,
the need for “independent” and “objective” evaluation of government programs
further motivated contracting out evaluation work. In response, research organi-
zations such as Abt Associates, Mathematica, and MDRC, among others, emerged
in the mid-1960s, specializing in designing and implementing high-quality, rigor-
ous social experiments of large-scale, multi-year demonstration projects (e.g., Peck,
2018). Based on a survey across federal agencies, the General Accounting Office
(GAO; now the Government Accountability Office) identified 5,610 completed fed-
eral evaluations from 1973 to 1979, averaging 935 completed evaluations per year
(Comptroller General, 1980). The trunk of an evaluation tree was formed.
The first pruning of evaluation came in the early 1980s with the emergence
of fiscal conservatism, the roll-back of federal spending on social programs, and
the concomitant roll-back of evaluation spending. The political environment had
shifted, and that shift’s impact on the demand for evaluation services was immedi-
ate. Federal funds earmarked for evaluation decreased by 37 percent between 1980
and 1984 (Shadish et al., 1991, p. 27). The reduction was not solely due to fiscal
conservatism. Although previous decades had great hopes for evaluation’s abil-
ity to determine what works, the early demonstration programs often delivered
mixed findings or even null impacts (Pressman & Wildavsky, 1984; Weiss, 1998).
Accordingly, a growing skepticism about the effectiveness of social programs (as
well as the ability of evaluation to identify effective programs) contributed to the
widespread cutbacks in federal expenditures for evaluation (Shadish et al., 1991).
Interestingly, while the overall funding for evaluation was reduced, the number of
federally funded evaluations remained more or less the same; many of the evalu-
ations were now smaller and with an internal focus, with fewer larger multi-year
evaluations being contracted out (Shadish et al., 1991, p. 27). Accordingly, a larger
number of evaluations took place in house, a trend made possible by the accumula-
tion of internal evaluation capacity within the federal government over the preced-
ing decades.
Toward the end of the 1980s and into the early 1990s, federal government inter-
est in evaluation increased. In a series of reports titled Program Evaluation Issues, the
U.S. GAO (1992) expressed concern about the reduction in the nation’s ability to
support evaluation. Echoing earlier calls for social accountability, the GAO asserted
the importance of evaluation for the federal government by rhetorically asking:
“Are the federal officials who administer programs adequately informed about the
implementation and the results of those investments?” (1992, p. 4). Again, the main
stimulus for evaluation came through legislative action, ushering in a new era of
15410072, 2020, S1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/psj.12387 by Universitat Politecnica De Valencia, Wiley Online Library on [27/10/2022]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Lemire et al.: The Growth of the Evaluation Tree in the Policy Analysis ForestS51

evaluation and governance. The Government Performance and Results Act (GPRA)
of 1993, one of a series of laws designed to improve government performance while
reducing the budget deficit, signified if not a novel shift then at least a renewed com-
mitment to government-wide performance measurement and management, moni-
toring outcomes over outputs, and managing toward agency- or government-wide
goals. Under GPRA, all federal departments and divisions were required to estab-
lish performance targets, monitor performance against those targets, and report on
their progress (or lack thereof). “Progress” was gauged in terms of achieving target
outcomes as opposed to generating impacts, both of which are a focus of evaluation
work.1 
A concurrent development, and one that revisited and revitalized the
“what works” question about impacts of the Great Society era, was the push for
evidence-based practice and policy: the idea that policies and programs should be
grounded in evidence, as opposed to values, ideology, fads, and other persuasions.
Evidence-based policy reform assumes, of course, an evidence base. Building on the
growing number of experimental and high-quality quasi-experimental evaluations,
the intended evidence base was accumulated in government-funded clearinghouses
established in the early 2000s, such as the What Works Clearinghouse, a flagship
initiative for evidence-based practice in education. Other well-known repositories
of evidence include the Best Evidence Encyclopedia, the Campbell Collaboration,
the Clearinghouse for Labor Evaluation and Research, and the California Evidence-
Based Clearinghouse for Child Welfare. The number of clearinghouses and other
repositories of evidence continues to grow (Westbrook, Avellar, & Seftor, 2017).
During the early 2000s, the demand for evaluation was further reinforced by a
number of legislative acts and initiatives in the federal government. The Information
Quality Act of 2001 and the Office of Management and Budget’s Guidelines
for Ensuring and Maximizing the Quality, Objectivity, Utility, and Integrity of
Information Disseminated by Federal Agencies both advise agencies on assuring
data quality (from collection through dissemination) as well as on developing the
appropriate administrative mechanisms to maintain these data standards. In 2002, to
strengthen GPRA, the Office of Management and Budget also created the Program
Assessment Rating Tool—a survey covering program purpose and design, strategic
planning, program management, and program results. In 2003 (then in a refreshed
version in 2007), the U.S. Department of Education, Institute for Education Sciences
issued a statement on Scientifically Based Evaluation Methods that emphasized the
use of experimental and quasi-experimental research designs that allow estimation
of least-biased, causal program impacts. Then, in 2010, GPRA was reaffirmed, and
expanded, in the form of the Government Performance and Results Modernization
Act. More recently, the bipartisan establishment of the U.S. Commission on Evidence-
Based Policymaking, charged with examining how government can make better use
of its existing data to provide evidence for future government decisions, signifies
the sustained integration of evaluation in government decision-making about social
programs and policies. Considered collectively, these legislative mandates, many of
which are bipartisan, speak directly to the federal government’s sustained interest
and investment in evidence-informed decision making.
15410072, 2020, S1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/psj.12387 by Universitat Politecnica De Valencia, Wiley Online Library on [27/10/2022]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
S52 Policy Studies Journal, 48:S1

The Evaluation Tree: Recent Trends in Methods, Valuing, and Use

With the early growth of the evaluation tree described, we begin to see that map-
ping the diverse and evolving landscape of evaluation is a nontrivial task. The reach
of evaluation, the boundaries of our practice, covers a broad range of sectors; an
even broader range of policies, programs, and projects; and a still broadening range
of themes and topics. After all, and as Shadish and colleagues (1991) dryly note, “we
can evaluate anything—including evaluation itself” (p. 19). From this perspective,
and true to the metaphor of flora and fauna, we will need a classification system, a
systema natura in Linnaean terms, to capture and make sense of the blooming field of
evaluation wildflowers, weeds and all.
The evaluation theory tree (Christie & Alkin, 2013) serves well as an organiz-
ing framework for reflecting on recent developments in evaluation. The tree has
become a well-known organizing framework for describing and discussing the eval-
uation theory and practice in evaluation circles. Accordingly, it offers a recognized
framework for those already familiar with the field of evaluation. For those new
to the field (and its literature), which would likely include at least some readers of
the Policy Studies Journal, the evaluation theory tree offers an accessible organizing
framework for understanding three central dimensions of the evaluation theory and
practice.
The original evaluation theory tree depicts evaluation theorists and scholars
by name. For our purpose of mapping the recent trends in evaluation, we inten-
tionally modify the tree to focus on the themes and approaches that characterize
the branches. The result appears in Figure 1. Following Christie and Alkin (2013),
the roots of the evaluation theory tree—the intellectual foundations from which
evaluation emerged—are described as social accountability (the demand for social
programs to demonstrate positive outcomes or impact), systematic social inquiry
(the commitment to systematic use of social scientific methods), and epistemology
(considerations about the nature and validity of knowledge). As Christie and Alkin
(2013) note, each of these roots has motivated, influenced, and provided a foun-
dational rationale for the conceptualization and development of the practice and
profession of evaluation.
We discuss the main, central branch of the theory tree first: methods. Growing
directly from the social inquiry root, the methods branch concerns research meth-
odology, including designs, methods, approaches, and techniques for conducting
evaluations. Historically, the branch grew from the work of Campbell and Stanley
(1963) on experimental and quasi-experimental designs, and centered on the valid-
ity threats and solutions associated with these designs. Since then, the branch has
grown and diversified to include a broader range of tools (program theory), evalua-
tion approaches (theory-based evaluation), research designs (covering experimental
and quasi-experimental designs, and including systematic reviews), data collection
methods, and analytical techniques, spanning across the quantitative and qualita-
tive traditions.
The valuing branch concerns the ways in which evaluators determine the
“merit” and “worth” of a program or policy as part of an evaluation (Scriven, 1991,
15410072, 2020, S1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/psj.12387 by Universitat Politecnica De Valencia, Wiley Online Library on [27/10/2022]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Lemire et al.: The Growth of the Evaluation Tree in the Policy Analysis ForestS53

Figure 1.  The Evaluation Tree.


Source: Modified from Christie and Alkin (2013).

p. 139). This branch highlights a “normative” (value-based) as opposed to “positive”


(fact-based) perspective; that is, it describes what should be done rather than simply
describing what is. The valuing branch considers how, by whom, and in what way
evaluative judgments about social programs are determined (Peck, Kim, & Lucio,
2012).2  Historically, contributions on the valuing branch have involved philosophi-
cal reflections on programs’ intrinsic value (or quality) or whether/how value state-
ments are socially constructed; conceptual clarifications on the meaning of terms
such as merit and worth; practical exchanges on the need for explicit criteria and
standards by which we may evaluate policies and programs; ethical considerations
concerning the roles of evaluators, program participants, and stakeholders in deter-
mining a program’s value; and a growing recognition of the ways in which differ-
ent value systems and perspectives may influence the merit and worth we assign
a program or policy. In terms of evaluation approaches, this branch includes social
justice-oriented approaches and responsive approaches to evaluation (i.e., evalua-
tion approaches that focus on the social diversity of program staff, beneficiaries, and
stakeholders, and their needs).
15410072, 2020, S1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/psj.12387 by Universitat Politecnica De Valencia, Wiley Online Library on [27/10/2022]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
S54 Policy Studies Journal, 48:S1

The remaining branch on the evaluation tree considers evaluation use, which,
as an area of inquiry, involves identifying what factors are associated with use of
an evaluation. Historically, the use of evaluations has always been a central con-
cern in the practice and profession of evaluation. As described by Alkin and King
(2016), over the years evaluators have cycled through various conceptualizations
of “use,” including instrumental, conceptual, and symbolic use. Most prominently,
instrumental use refers to the direct use of evaluation findings for decision making.
Conceptual use or enlightenment implies that an evaluation is used “to change lev-
els of knowledge, understanding, and attitude” (Peck & Gorzalski, 2009, p. 141).
Symbolic use refers to an evaluation being used “to convince others of a political
position” (Peck & Gorzalski, 2009, p. 141). Currently, a prominent concept of use is
that of process use, which refers to the potential utility of stakeholders’ participa-
tion in the evaluation process. In extending these conceptualizations of use, a broad
range of participatory and utilization-focused evaluation approaches—which all
focus on promoting use—are associated with this branch (e.g., Patton, 2011).
As this discussion reveals, there is some overlap across the three main branches
of the evaluation tree. This overlap is necessary given the integrated nature of meth-
ods, valuing, and use in evaluation practice. As we note later, several of the identi-
fied trends discussed below could have been placed on more than one branch (for
example, see the  description below of collaborative evaluation approaches under
the use branch, which also has important methods aspects). In locating the identified
trends on the evaluation tree, we have decided to follow in the footsteps of Christie
and Alkin (2013), placing the trends in line with their primary emphasis.
Having defined the three branches, we turn to discussing recent trends that align
with each. The identified trends are not equally relevant or applicable across the
many sectors and areas within the broad scope of evaluation. This is to be expected
given the diverse nature of evaluation as a field of practice. Accordingly, the trends
may apply to different degrees across the broad field of evaluation. Considered col-
lectively, however, the identified trends are fundamental to the future development
of the practice and profession of evaluation.

The Methods Branch

Of the three branches on the evaluation tree, and in addition to being the main,
central branch, the methods branch is also the fastest growing. This is not surpris-
ing given the prominent role of methods in the day-to-day practice of evaluation:
evaluators are, after all, a pragmatic bunch of practitioners. Reflecting on recent de-
velopments on the methods branch, we identify three prominent trends: (1) big data
analytics, (2) understanding how and why programs work, and (3) complexity theory
and systems thinking.
Methods Trend 1: Big Data Analytics.  A prominent development on the methods branch, and
one that will likely grow and expand for decades to come, is the emergence of big data
analytics (Bamberger, 2016; Raftree & Bamberger, 2014). Big data refers to new data
sources (e.g., tweets, data warehouses managed by corporations and state entities) as
15410072, 2020, S1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/psj.12387 by Universitat Politecnica De Valencia, Wiley Online Library on [27/10/2022]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Lemire et al.: The Growth of the Evaluation Tree in the Policy Analysis ForestS55

well as to an associated suite of analytical approaches (natural language processing,


predictive analytics, machine learning). Fueled by technological advances that
allow an analyst to access, manage, and make use of data from the digitalization of
modern life, the big data revolution has ushered in a broad range of methodological
developments in evaluation. These developments include the use of natural language
processing techniques to review and code volumes of text that would otherwise be
prohibitively large and time-consuming, the use of machine learning and predictive
analytics to predict career pathways of community college students (e.g., Cho,
2016), the use of a smart phone application and imagery technology to instantly
count mosquito eggs as part of efforts to combat the Zika virus and malaria (Abt
Associates, 2017), and the sentiment analysis and social network analysis of tweets
as part of an evaluation of the peace process in Colombia (Fabra-Mata & Mygind,
2019). Big data’s list of data sources and analytical strategies continues to grow.
These trends are facilitated by a broadening range of emerging information and
communication technologies for data collection and analysis. For example, satel-
lite images and remote sensors can detect thatched roofs as a predictor of poverty
levels, drones can monitor agricultural crops and forest development, and mobile
phones/devices can collect real-time data (Bamberger, 2016; Friedman, Woodman,
& Chatterji, 2015; Raftree & Bamberger, 2014). These recent innovations are already
being intensively used by evaluators in international development circles, where
data collection in hard-to-reach areas is commonplace.
The full potential of big data for evaluation has yet to be realized. For now, the
big data-related trends appear to invoke a mixed sense of methodological promise
and peril, opportunity, and apprehension. The emergence of big data opens up new
data sources, and associated analytic methods that, if applied with rigor, are worth
pursuing in the context of evaluation. However, big data and data analytics do not in
and of themselves improve causal inference or support unbiased impact estimation,
serve as replacements of traditional designs and methods, or render causal theory
obsolete (Lemire & Petersson, 2017). To be sure, there are legitimate concerns about
the use of big data. How do we gauge the quality of these data? Are traditional stan-
dards of validity and reliability still relevant? What are the ethical issues related to
how we generate, manage, and analyze big data? Whose voices are heard (and per-
haps more importantly not heard) in the big data streams? These kinds of questions
will be central as big data use in evaluation spreads.
Methods Trend 2: Understanding How and Why Programs Work.  Another notable trend on the
methods branch is the sustained interest in how, why, and under what circumstances
programs and policies work, including better understanding of the underlying
logic of social programs and policies. Although this interest has strong and
longstanding roots in theory-based evaluation in general (Chen & Rossi, 1983) and
in the realist tradition more specifically (Pawson & Tilley, 1997), it has taken new
forms in recent years. Refreshed interest in the underlying logic of programs has
spurred methodological growth in several noteworthy directions, much of which
is still taking shape. Here we discuss two variants: case based and variance based
(Schmitt, forthcoming).
15410072, 2020, S1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/psj.12387 by Universitat Politecnica De Valencia, Wiley Online Library on [27/10/2022]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
S56 Policy Studies Journal, 48:S1

The first strand of methodological development involves case-based designs


and methods for understanding how or why programs or policies work. Rooting
their activities in the work of Charles Ragin (2014), evaluators are increasingly using
qualitative comparative analysis (QCA) as a way of identifying the program compo-
nents that individually (or in specific configurations) are logically related to a posi-
tive outcome of interest. Promoted in evaluation circles by Pattyn, Molenveld, and
Befani (2017), among others, use of QCA has spread among evaluators, especially
those doing work in theory-based evaluation. By relying on logical relationships,
as opposed to statistical associations, QCA provides an alternative to the regres-
sion-based analyses often used in social experiments and quasi-experiments. That
said, QCA has also been criticized for ignoring statistical variability, and so is by no
means a replacement for experimental- or quasi-experimental-based analyses.
Relatedly, process tracing, an analytical approach emerging from the single
case-study design literature (Beach & Pedersen, 2013), is another means for exam-
ining how and why programs work. Process tracing aims to identify and empir-
ically examine the underlying causal mechanisms connecting specific program
components and a set of desired outcomes. This usually involves formulating the
hypothesized connections, specifying the distinct observable data patterns that
would indicate the presence of the purported causal mechanisms, collecting data
(qualitative and quantitative) on these, and considering the relative weight of evi-
dence for each mechanism using formalized empirical tests (Pattyn et al., 2017). Like
QCA, process tracing is also not a replacement for experimental- or quasi-experi-
mental-based analyses as a way to understand program impacts, but instead is an
alternative that evaluators have begun to use in recent years.
In contrast to these case-based methods (QCA, process tracing), variance-based
methods for understanding how and why programs work have also advanced in
recent years. This strand of methodological research focuses on experimental and
quasi-experimental impact evaluations, and their ability to provide convincing
causal evidence not only on program impacts but also on impact drivers, the mech-
anisms through which impacts arise. Here emphasis is placed on the use of eval-
uation design features and experimentally based analytic strategies to empirically
test the relative contribution of specific program components to program impacts.
Multi-armed, multi-stage, and factorial designs, for example, can be used to estimate
the relative effect of specific program components as part of a broader analysis of
program effect (Bell & Peck, 2016; Peck, 2015). These designs gain insight into how
programs work by combining specific experimental design features, implementa-
tion research, and statistical modeling.
Beyond designing impact evaluations to answer nuanced questions about pro-
gram effectiveness, the availability of experimental data also permits analysts to
leverage those data for further analyses. An early example is that of Bloom, Hill and
Riccio (2003) who pooled multiple experimental data sets, layered on implementa-
tion data, and analyzed which implementation features associated with program
impacts. Analysis of symmetrically predicted endogenous subgroups (ASPES; Peck,
2003, 2013) and other similar mediational analyses also exploit experimental data to
analyze program’s varied pathways to impact.
15410072, 2020, S1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/psj.12387 by Universitat Politecnica De Valencia, Wiley Online Library on [27/10/2022]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Lemire et al.: The Growth of the Evaluation Tree in the Policy Analysis ForestS57

The combination of advances in design and analysis creates a potent solution


that counters a longstanding criticism of social experiments as being limited in their
ability to explain how programs work. The knowledge benefits and programmatic
insights from experimental evaluations now include getting inside the black box.
From our perspective, these methodological developments—the various
designs and methods for understanding how and why programs work—all share
the potential to help better illuminate the underlying logic and impacts of programs.
Understanding how and why programs work holds great value for evaluators and
stakeholders, both from a learning perspective and for making decisions about
future program design and refinement. For this reason, we suspect that the push for
this type of information will continue in both evaluation and policy circles, as will
advances in the methods used to produce the information.
Methods Trend 3: Complexity Theory and Systems Thinking.  A third trend on the evaluation
methods branch relates to the science and theory of complexity (Forss, Marra,
& Schwartz, 2011)—which distinguish between “complicated” and “complex”
programs. Complicated programs have multiple components, levels of
implementation, and implementation sites; implementing agencies with multiple
agendas; and often multiple strands of outcomes (Funnell & Rogers, 2011).
Complicated programs are common in evaluation. In contrast, complex
programs are fluid and continuously developing, reacting and responding to
emerging challenges and opportunities, many of which are difficult to predict and
beyond the control of program staff and evaluators (Funnell & Rogers, 2011; Peck,
2015). Patton (2011) illustrates the distinction between complicated and complex
programs via a metaphor, stating that “sending a rocket into space” is complicated,
while “raising a child” is complex (p. 92): these two examples represent,
respectively, the known/knowable (complicated) versus unknown/unknowable
(complex) parts of a program. The complicated vs. complex distinction also has
been described by other researchers as “messy” problems (complicated) versus
“wicked” problems (complex; Springer et al., 2017).
Evaluators are making inroads into evaluating complicated programs, with
advances in quantitative techniques including hierarchical linear modeling,
mediation analysis, and mixed-methods approaches. Complex programs in par-
ticular pose unique challenges for evaluation, because complexity is inconsistent
with some of the assumptions of traditional methods. Established program eval-
uation approaches often assume that the program or policy is stable, or at least
one whose changes are or can be known to the evaluator. With rapidly changing
programs whose definitions of “success” can change depending on a particular
stakeholder’s perspective, a new lens is required. Addressing truly complex pro-
grams may require layering data from multiple sources, gathering perspectives
from multiple stakeholders, using agile evaluation techniques (e.g., rapid-cycle
evaluation), and applying methods to understand the interaction between stake-
holders across systems (e.g., social network analysis). Addressing complexity will
ultimately require reconciling both the art and the science of evaluation work
(Springer et al., 2017).
15410072, 2020, S1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/psj.12387 by Universitat Politecnica De Valencia, Wiley Online Library on [27/10/2022]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
S58 Policy Studies Journal, 48:S1

Complexity theory also relates to systems thinking, a broad range of principles


and practices related to the idea that all programs and policies are embedded within
broader systems (Williams & Hummelbrunner, 2011). In response, some evalua-
tions need to view programs from a systems perspective. To date, evaluators have
tended to talk about more than actually practice systems thinking. Nevertheless,
practical applications of systems thinking to evaluation are starting to emerge. They
generally take the form of frameworks for understanding context (Vanderkruik &
McPherson, 2017); system mapping and system dynamic approaches for describing
the components of systems and how these connect and interact to generate change
(e.g., stock-and-flow and causal loop diagrams); and the use of social network
analysis to understand how “actors” within these systems are related (Williams &
Hummelbrunner, 2011).
We find it promising that program complexity is increasingly being recognized
and pursued analytically as part of evaluations. Because evaluators must often
evaluate complex initiatives operating in real-world environments, approaches for
understanding this type of complexity are needed. Methods are being developed
to account for and explain the effects of complexity, rather than simply to control
for them (Kubisch, 2010; Springer & Porowski, 2012). Similarly, the application of
systems thinking in evaluation is also necessary for understanding how social pro-
grams and policies work (see above discussion of Methods Trend 2: Understanding
how and why programs work). These methodological advances offer a means for
evaluating efforts that involve systems change.

The Use Branch

A basic purpose of evaluation is its use (e.g., in the policy process, for program
operations). Current emphasis on use centers on two prominent trends: (1) evalu-
ation capacity building and (2) collaborative/participatory evaluation approaches.
Use Trend 1: Evaluation Capacity Building. Evaluation capacity refers to the organizational
capacity to conduct and use evaluations (Cousins, Goh, Elliott, & Bourgeois, 2014).
This capacity may be in the form of evaluation staff and units; guidelines, policies, and
procedures for when and how to conduct evaluations; and systems and processes
for disseminating and making use of evaluations within the organization. Relatedly,
evaluation capacity building refers to the strategies by which evaluation capacity is
developed and maintained in organizations. The institutionalization of evaluation
withinorganizationscanbetracedbacktotheestablishmentofevaluationunitsinthefederal
government in the 1970s (as mentioned earlier); however, the recent push for evaluation
capacity building signifies another generation of institutionalization of evaluation, a
manifestation of the integration of evaluation in the fabric of organizations. The growing
scholarship on this topic covers various theoretical conceptualizations of evaluation
capacity (Bourgeois & Cousins, 2013), measurement models and validations of these
(Nielsen, Lemire, & Skov, 2011), strategies and approaches for building evaluation
capacity (Preskill & Boyle, 2008), and syntheses of the existing literature (Labin, Duffy,
Meyers, Wandersman, & Lesesne, 2012). Evidence of evaluation capacity building in
15410072, 2020, S1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/psj.12387 by Universitat Politecnica De Valencia, Wiley Online Library on [27/10/2022]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Lemire et al.: The Growth of the Evaluation Tree in the Policy Analysis ForestS59

practice can be seen in many of the Obama Administration’s evidence-based social policy
initiatives, such as the Investing in Innovation Fund (i3), Social Innovation Fund, and
Workforce Innovation Fund, which included both a requirement for prior evidence as a
condition of the grant and the conduct of a rigorous impact evaluation supported by a
federally funded technical assistance provider, where that external provider works with
evaluators to enhance the quality of their work.
Increasing evaluation capacity is also being driven by the development and
establishment of evaluation policies within federal government agencies, founda-
tions, and NGOs, among others. Evaluation policies are important in no small part
because they provide “guidance on how, when, and with what purpose evalua-
tions are carried out within an organization, that is, within a specific organizational,
cultural, and political context” (Christie & Lemire, 2019).3  In this way, evaluation
policies serve as mechanisms to integrate evaluation practice and help build true
learning organizations (Preskill, 1994).
Another trend emerging from evaluation capacity building is that of evaluative
thinking, which can be defined as:

critical thinking applied in the context of evaluation, motivated by an atti-


tude of inquisitiveness and a belief in the value of evidence, that involves
identifying assumptions, posing thoughtful questions, pursuing deeper un-
derstanding through reflection and perspective taking, and informing deci-
sions in preparation for action. (Buckley, Archibald, Hargraves, & Trochim,
2015, p. 378)

First defining the evaluative thinking and then providing support for key person-
nel in varied organizational settings to embrace that thinking holds the potential to
increase the evaluation capacity. That said, the current focus on evaluative thinking
is primarily conceptual. In practice, evaluators have made only limited attempts—
with even fewer applied examples—to demonstrate evaluative thinking in practice.
This observation is not raised as a critique; boundary probing is often a necessary
and useful first step when introducing a novel concept into a field of scholarship or
practice.
We find the surge of interest in evaluation capacity building and evaluative
thinking important for several reasons. This trend signifies a central commitment
to promoting evaluation in society. To do so involves socializing evidence building,
building the capacity to conduct evaluations, and institutionalizing the use of eval-
uation findings in practice. Moreover, the demand for evaluation capacity—within
federal government agencies, foundations, and NGOs, among others—indicates a
sustained interest in and commitment to using evidence. As organizations become
“learning” entities that value the evaluative thinking, the use of evaluation is more
likely to be embedded into decision-making procedures.
Use Trend 2: Collaborative/Participatory Evaluation Approaches. A second trend on the use
branch—with strong methods branch connections—is that of the expansion of
collaborative and participatory approaches to evaluation. Participatory evaluation
15410072, 2020, S1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/psj.12387 by Universitat Politecnica De Valencia, Wiley Online Library on [27/10/2022]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
S60 Policy Studies Journal, 48:S1

emphasizes stakeholder involvement (program funders, staff, and participants,


among others) in all or most of the design, implementation, and reporting stages of the
evaluation. Through such emphasis, participatory evaluations offer stakeholders a
voice in how the evaluation is designed and implemented, promoting a sense of
ownership and empowerment and enhancing use.
A particular type of participatory evaluation is “developmental evaluation,”
which combined utilization-focused principles with both the complexity concepts
and stakeholder involvement (Patton, 2011). Developmental evaluation aims to
“make sense of what emerges under conditions of complexity, documenting and
interpreting the dynamics, interactions, and interdependencies that occur as innova-
tions unfold” (Patton, 2011, p. 7). The evaluative focus is on the processes by which
innovations come about and the learning that may emerge from such innovations.
We contend that this trend also relates to improvement science (Christie, Inkelas,
& Lemire, 2017) in that the central focus is on program design, innovation, and repli-
cation within complex environments. It also relates to rapid cycle evaluation (Cody
& Asher, 2014), because these methods both involve continuous program improve-
ment through iterative cycles of program development, systematic data collection,
learning, and feedback.
Recent developments in this area also include outcome mapping and outcome
harvesting, both of which are participatory approaches. Outcome mapping relies on
stakeholder involvement for program planning, design, evaluation, and monitor-
ing, emphasizing the collaborative establishment of intended outcomes and corre-
sponding targets (Jones & Hearn, 2009). Outcome harvesting focuses on stakeholder
involvement for identifying which outcomes (intended and unintended, positive
and negative) were realized by a program of interest (Wilson-Grau & Britt, 2012).
These latter offshoots connect in several ways to the methods trend relating to how
and why programs work, with many of the methods and approaches increasingly
relying on the active stakeholder involvement.
These collaborative approaches can be structured in a number of ways, includ-
ing: (1) participatory action research, which involves the planning, design, con-
duct, and use of research with the people whose experiences are under study; (2)
researcher-practitioner partnerships, which involve the collaborative development
of research projects and integrating the use of evidence among stakeholders; and (3)
communities of practice, which involve the collaboration of stakeholders grouped
by occupational focus or domain expertise.
From a learning perspective, active participation of stakeholders in the evalua-
tion process comes with potential benefits. Including stakeholders in the design and
implementation of the evaluation promotes ownership and buy-in of the evaluation,
which in turn provides the potential for more useful and actionable findings for
decision makers and practitioners.
Nevertheless, it is still quite common for evaluations to take place with an out-
side, “objective technician” who is not a program participant or insider of any sort.
From a funder perspective, results from an objective technician evaluator will often
have greater credibility than those from a participatory evaluation because they are
not influenced by special interests of participants or program staff, for example. We
15410072, 2020, S1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/psj.12387 by Universitat Politecnica De Valencia, Wiley Online Library on [27/10/2022]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Lemire et al.: The Growth of the Evaluation Tree in the Policy Analysis ForestS61

recognize that these two models for evaluation—participatory versus objective tech-
nician—are the source of some tension within the field. To us, both models seem
appropriate in certain circumstances. For example, if the evaluation is being con-
ducted (and perhaps paid for by) the program itself, then the participatory model
seems fitting. Moreover, if the evaluation is being conducted for or by an outsider (a
current or potential future funder), then the participatory model seems inappropri-
ate. There are settings and situations where engaging research subjects in an evalu-
ation might jeopardize the quality of an evaluation; and similarly there are settings
and situations where not engaging research subjects might do so. Collecting data
from among very vulnerable populations (such as the homeless), for instance, can
be made much more effective by including those individuals in the data collection
design (here, again, is a connection to the valuing branch, which we discuss next).
In brief, it is worth considering the trade-offs and choosing the right model for the
questions and associated evaluation at hand.

The Valuing Branch

The most pronounced trend in the valuing branch is the growing attention to
cultural awareness and cultural competence in evaluation practice.
Valuing Trend: Cultural Awareness in Evaluation Practice. A major area of scholarly and
practical discussion centers on how culture is conceptualized and defined in making
evaluative judgments and how evaluation practitioners, scholars, and/or policymakers
consider cultural context in practice (e.g., Chouinard & Hopson, 2015). Cultural
awareness is fundamental, especially in many evaluation contexts, including
international development evaluation and evaluation in indigenous communities,
where the dominance of Western epistemological approaches to knowledge
production is particularly problematic.
One notable example of mainstreaming cultural awareness in evaluation is
visible in the American Evaluation Association’s Guiding Principles for Evaluators.
Revised in 2018, the principles now emphasize cultural competence as a core ele-
ment of professional practice. According to the standards, “culturally competent”
evaluators are defined as those “who draw upon a wide range of evaluation theories
and methods to design and carry out an evaluation that is optimally matched to the
context. In constructing a model or theory of how the evaluation operates, the evalu-
ator reflects the diverse values and perspectives of key stakeholder groups, for good
practice” (American Evaluation Association, 2018, p. 3).
The growing integration of cultural competence in evaluation is evident in a
Centers for Disease Control and Prevention (Gervin et al., 2014) guide on practical
strategies for cultural competence in evaluation. These strategies include emphasiz-
ing the need to surface and examine cultural biases and assumptions at every stage
of the evaluation; the importance of reflecting on how professional and personal
backgrounds may influence our analyses and interpretations of data; and the ethical
obligation to consider the influence of these when drawing evaluative conclusions.
15410072, 2020, S1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/psj.12387 by Universitat Politecnica De Valencia, Wiley Online Library on [27/10/2022]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
S62 Policy Studies Journal, 48:S1

Another government effort that flags cultural competence in evaluation practice


is that of an explicit choice by the U.S. Department of Health and Human Services,
Administration for Children and Families, Office of Planning, Research, and
Evaluation to fund a tribal-specific evaluation of a major job training program, con-
sidering the tribal grantees as distinct and worthy of their own evaluation, driven
by tribal members. OPRE published a set of associated principles to guide research
with tribal communities (Meit et al., 2017).
The mainstreaming of cultural competence and the general commitment to
cultural awareness that this trend reflects is an important development that we
expect will continue to shape the practice and profession of evaluation in the future.
Cultural competence and awareness are no longer viewed as peripheral to eval-
uation but are increasingly recognized as fundamental premises for good evalua-
tion practice. Cultural awareness has a particularly welcome home in participatory
and collaborative evaluation approaches, but its relevance extends to all models for
evaluation.

A Revised Evaluation Tree

To summarize our assessment of the new growth on the evaluation tree, we


have identified six major trends across the three main branches: (1) big data ana-
lytics, (2) understanding how and why programs work, (3) complexity theory and
systems thinking, (4) evaluation capacity building, (5) collaborative/participatory
evaluation approaches, and (6) cultural awareness in evaluation practice.
These trends appear in the revised evaluation tree, as illustrated in Figure 2.
Our assessment describes a still growing evaluation tree, but one that is in its prime
of life. This fully grown tree is characterized by a sturdy trunk flanked by strong
branches, and a multitude of offshoots. Over the years, repeated pruning and nat-
ural decay have removed low, dead branches, allowing fresh and healthy growth.
Having matured from early seeds in assessment and auditing, nurtured by legisla-
tive demands for evaluative information about social programs, pruned by fiscal
conservatism and natural decay, the evaluation tree has matured and developed a
full crown. The tree continues to bear fruit. Scattered twigs and fresh offshoots bring
new life and promote fresh growth in new directions.

The Evaluation Tree in the Policy Analysis Forest: Advancing Toward


Transdisciplinary Practice

The purpose of this article has been to consider recent trends in evaluation and
to situate these trends within the broader field of policy analysis. As our explora-
tion of the evaluation tree in the preceding pages illustrates, the development of the
practice and profession of evaluation has in fundamental ways been influenced by
changing needs and demands for evidence in society at large and the federal, state,
and local government more specifically.
15410072, 2020, S1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/psj.12387 by Universitat Politecnica De Valencia, Wiley Online Library on [27/10/2022]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Lemire et al.: The Growth of the Evaluation Tree in the Policy Analysis ForestS63

Figure 2.  Recent Trends on the Evaluation Tree.


Source: Modified from Christie and Alkin (2013).

This directs our attention to the central role of evaluation—and other types of
knowledge production—in providing actionable evidence for use in public policy
and program decision  making. This actionable evidence role affects every aspect
of evaluation: the methods by which evidence about social programs and policies
is derived (the types of evidence called for), the valuing of the merit and worth of
social policies and programs (the standards and criteria used), and the potential use
(or misuse) of these judgments as reflective of and responsive to policy and program
decision making. From this perspective, trends in evaluation—along with trends in
any other type of knowledge production—should be viewed within broader trends
in knowledge production of actionable evidence for political decision-making
processes.
In reflecting on the six identified trends and what they tell us about develop-
ments in knowledge production for policy decision making, this concluding discus-
sion makes three observations:

1. Evaluation is one type of knowledge production and one form of evidence.


2. Knowledge workers, practitioners, and policymakers are integral to promoting
evidence use.
15410072, 2020, S1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/psj.12387 by Universitat Politecnica De Valencia, Wiley Online Library on [27/10/2022]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
S64 Policy Studies Journal, 48:S1

3. Solving social problems demands purposeful integration of multiple disciplinary


perspectives.

These are observations that we see reflected in the identified trends in evalu-
ation. The observations also cut across and reach beyond the individual branches
of the evaluation tree and consider their location in the policy analysis forest. We
consider these observations in turn.

Observation 1: Evaluation Is One Type of Knowledge Production and One Form of


Evidence

Evaluation—conceptualized here as a type of policy analysis that uses sys-


tematic data collection to determine the worth of a program or policy—is one
type of a broader machinery of knowledge production that informs political de-
cision making. Likewise, evaluators—just like policy analysts, auditors, and man-
agement consultants—are part of the collective of knowledge workers supporting
policymakers, program funders, program staff, and other stakeholders in making
decisions about social policies and programs. Underlying these professions and
professionals is a shared commitment to producing evidence for policy and pro-
gram decision making.
Evaluation generates just one form of evidence. As discussed, evidence in the
early days of evaluation was primarily defined in terms of findings from experi-
mental and quasi-experimental designs. However, needed types of evidence have
always been broader. Over time, “evidence” has grown accordingly to include a
much broader range of designs, evaluation approaches, and types of data. The
emergence of big data analytics and the shift toward understanding how and why
programs work are both reflective of the broad and still broadening range of evi-
dence. These trends combined focused not just on program outcomes and impacts
but also on program processes and the relationships among processes, outcomes,
and impacts.
The expansion of the evidence base makes the question of what counts as cred-
ible evidence even more topical. This is a longstanding debate in which consensus
seems unlikely (Donaldson, Christie, & Mark, 2009). Our position is that we should
continue to expand the definition of credible evidence, but with attention to rigor
and policy relevance. There is value in the potential of big data analytics, among
other methodological trends, if and only if they are coupled with critical and sound
methodological reasoning. We also contend that what counts as credible evidence
depends on the type of evidence needed. If the aim of the evaluation is to deter-
mine a policy or program’s impact and quantify the magnitude of findings, then an
experimental design is likely preferable. When designed and implemented well, the
experimental approach produces the least biased net-effect estimate, effectively iso-
lating the contribution of the program to the outcomes of interest. Further, as noted
earlier, advances in experimental design and analysis have the potential to answer
more nuanced questions now than was true in the past. However, if the main aim of
an evaluation is to better understand, not whether, but also how and why a program
15410072, 2020, S1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/psj.12387 by Universitat Politecnica De Valencia, Wiley Online Library on [27/10/2022]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Lemire et al.: The Growth of the Evaluation Tree in the Policy Analysis ForestS65

works, to surface the inner workings of the program, then a theory-oriented evalua-
tion may be especially appropriate and credible for that purpose.

Observation 2: Knowledge Workers, Practitioners, and Policymakers Are Integral to


Promoting Evaluation Use

Our second observation is that the broadening range of evidence converges in


an integration of policymakers, practitioners, and knowledge workers around the
use of evidence in decision  making. Within evidence-based policy and practice,
there continues to be a movement toward creating institutionalized structures that
can support the use of evidence. As described earlier, the development of evalua-
tion has on more than one occasion been driven by legislative demand. The U.S.
Commission on Evidence-Based Policymaking and the Foundations for Evidence-
Based Policymaking Act of 2018 are recent examples. These legislative drivers rec-
ognize the important role of evaluation in government by directing agencies to
designate evaluation officers, create written evaluation policies, and establish multi-
year learning agendas to outline evaluation priorities. Moreover, recent trends re-
lated to evaluation capacity building and participatory evaluation approaches
reflect a move toward knowledge workers, practitioners, and policymakers being
integral to the use of evidence.
Despite the continuous progress and commitment to use of evidence, there is
still room to grow in terms of embedding the evidence-related practices into our
organizations’ status quo. As documented in a 2013 report from the GAO, fewer
than half of 24 agencies surveyed conducted any evaluations of their programs, and
only seven had a centralized leader with responsibility for overseeing evaluation
activities (cited in National Academies of Sciences, Engineering, and Medicine,
2017). With the bipartisan Evidence Based Policy statute, this may be changing in
the near future. To be sure, the recent trends from the evaluation field for capacity
building and collaborative evaluation approaches imply a future in which knowl-
edge workers, practitioners, and policymakers will advance evaluation use.

Observation 3: Solving Social Problems Demands Purposeful Integration of Multiple


Disciplinary Perspectives

Our third observation is that the problems of our world cannot be addressed by a
single discipline. Consider climate change, large-scale human migration, and global
inequality: to solve any of these challenges, we need to defy disciplinary boundar-
ies. Evaluation requires the work of methodologists, recruitment specialists, data
collectors, statisticians, economists, writers, and communications specialists—and
often in small evaluation shops, these roles must be filled by a single person (Peck,
2018). The academic disciplines from which evaluators come to take on these varied
roles includes anthropology, economics, political science, psychology, and sociology.
These disciplines bring their own lenses and tools that not only facilitate evaluation
work generally but also need to come together if we are to take on society’s biggest,
most vexing challenges.
15410072, 2020, S1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/psj.12387 by Universitat Politecnica De Valencia, Wiley Online Library on [27/10/2022]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
S66 Policy Studies Journal, 48:S1

Figure 3.  Types of Disciplinary Integration. Source: Jensenius (2012).

Moreover, because “at-risk” populations exist at the intersections of the health


care, educational, housing, child welfare, justice, and workforce development sys-
tems (among others), it is imperative that evaluations cross boundaries in these
policy areas of study and take a holistic approach to understanding problems.
Complexity theory and systems thinking speak to the recognition that the world’s
problems, and the programs and policies designed to address them, are embedded
within broader systems.
In working across occupational, disciplinary, and systemic boundaries, the
profession and practice of evaluation can produce knowledge that can lead to
stronger, more coordinated interventions that have the potential to improve the
lives of those most in need. As Stember (1991) proposed (see Figure 3), in addition
to intradisciplinary work (within one discipline), this type of knowledge produc-
tion may be:

• Multidisciplinary (i.e., people from varied disciplines working with one another);
• Cross-disciplinary (i.e., viewing one discipline from the perspective of the other);
• Interdisciplinary (i.e., integrating knowledge and methods from varied systems); and/or
• Transdisciplinary (i.e., creating a unity of methods and frameworks above and beyond the
individual disciplines).

By these definitions, by its nature, evaluation is at least both multidisciplinary


and interdisciplinary: evaluation work requires teams where members have the
distinct perspectives and skills to tackle the various roles required in a given proj-
ect. Further, the policy areas in which evaluation takes place imply that evalua-
tion is ideally a transdisciplinary endeavor. In the service of knowledge to inform
program and policy improvement, any one discipline may bring its own value,
but that discipline needs to be responsive to the knowledge demands of the eval-
uation. Although any one discipline may bring its own value to evaluation work,
the value of that disciplinary perspective is ultimately judged by its relevance for
policy or practice.
The roots of the evaluation tree and the academic disciplines based on them
provide a foundation for addressing societal challenges. The future will require
transdisciplinary approaches to address complex challenges, and all elements of the
tree’s branches will need to come together to provide the most rigorous and relevant
knowledge to address these challenges well.
15410072, 2020, S1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/psj.12387 by Universitat Politecnica De Valencia, Wiley Online Library on [27/10/2022]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Lemire et al.: The Growth of the Evaluation Tree in the Policy Analysis ForestS67

Sebastian Lemire is an associate at Abt Associates. As a scholar-practitioner, his


area of research revolves around theory-based evaluation, alternative approaches to
impact evaluation, and systematic evidence reviews.
Laura R. Peck is a principal scientist at Abt Associates. A policy analyst by training,
Dr. Peck specializes in innovative ways to estimate program impacts in experimen-
tal and quasi-experimental evaluations, and she applies this to many social safety
net programs.
Allan Porowski is a principal associate at Abt Associates. His research primarily
focuses on evaluations of education and public health interventions, as well as sys-
tematic evidence reviews.

Notes

The authors are grateful for the generous feedback provided by the many colleagues at Abt Associates
who read and commented on earlier versions of the article. These include John Hitchcock and Christopher
Weiss along with several participants in Abt’s Work in Progress Seminar. Most notably, Jacob Klerman
offered thoughtful feedback and guidance throughout the writing process to improve every aspect of the
article, and we sincerely thank him. The article also benefitted from constructive input from two anony-
mous reviewers. Any omissions and errors are the authors’.
1. GPRA focused on outcomes, which are a focus of some evaluation research. That said, summative eval-
uations instead focus on impacts, which are the difference in outcomes that are attributable to the program
or policy being evaluated. Barnow and Smith (2004) argued that GPRA’s incentives—at least in the
job training arena—served to reduce federal government focus on impacts by placing its emphasis on
performance management outcomes.
2. As noted by one of the reviewers, the meaning of the term valuing in the context of evaluation is dis-
tinct from the use of the term in the context of cost-benefit analyses, where the ratio of costs and out-
comes of an intervention are translated (and valued) in monetary terms. In the context of evaluation,
valuing refers to the act of attributing a value to a program. This act of valuing may of course involve
(but is by no means limited to) cost-benefit analyses (and outcomes expressed in monetary terms).
3. One such example is that of the U.S. Department of Health and Human Services, Administration for
Children and Families’ Evaluation Policy, available at https​://www.acf.hhs.gov/opre/resou​rce/acf-
evalu​ation-policy.

References

Abt Associates. 2017. “Abt Associates to Tackle Vector-Borne Diseases Worldwide.” News Post. https​://
www.abtas​socia​tes.com/who-we-are/news/news-relea​ses/abt-assoc​iates-to-tackle-vector-borne-
disea​ses-world​wide-wins-major​. Accessed October 29, 2019.
Alkin, Marvin C., and Jean A. King. 2016. “The Historical Development of Evaluation Use.” American
Journal of Evaluation 37 (4): 568–79.
American Evaluation Association. 2018. The 2018 American Evaluation Association Guiding Principles for
Evaluators. https​://www.eval.org/p/cm/ld/fxml:id=51. Accessed October 29, 2019.
Bamberger, Michael. 2016. Integrating Big Data into the Monitoring and Evaluation of Development Programmes.
http://unglo​balpu​lse.org/sites/​defau​lt/files/​Integ​ratin​gBigD​ata_intoM​EDP_web_UNGP.pdf.
Accessed October 29, 2019.
Barnow, Burt S., and Jeffrey A. Smith. 2004. “Performance Management of US Job Training Programs:
Lessons from the Job Training Partnership Act.” Public Finance & Management 4 (3): 247–87.
Beach, Derek, and Rasmus B. Pedersen. 2013. Process Tracing Methods: Foundations and Guidelines. Ann
Arbor, MI: The University of Michigan Press.
15410072, 2020, S1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/psj.12387 by Universitat Politecnica De Valencia, Wiley Online Library on [27/10/2022]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
S68 Policy Studies Journal, 48:S1

Bell, Stephen H., and Laura R. Peck. 2016. “On the ‘How’ of Social Experiments: Experimental Designs for
Getting Inside the Black Box.” New Directions for Evaluation 152: 97–107.
Bell, Winifred. 1983. Contemporary Social Welfare. New York, NY: MacMillan.
Bloom, Howard S., Carolyn J. Hill, and James A. Riccio. 2003. “Linking Program Implementation and
Effectiveness: Lessons from a Pooled Sample of Welfare-to-Work Experiments.” Journal of Policy
Analysis and Management 22 (4): 551–75.
Bourgeois, Isabelle, and J. Bradley Cousins. 2013. “Understanding Dimensions of Organizational
Evaluation Capacity.”American Journal of Evaluation 34 (3): 299–319.
Buckley, Jane, Thomas Archibald, Monica Hargraves, and William M. Trochim. 2015. “Defining and
Teaching Evaluative Thinking: Insights from Research on Critical Thinking.” American Journal of
Evaluation 36 (3): 375–88.
Campbell, Donald T. 1969. “Reforms as Experiments.” American Psychologist 24: 409–29.
Campbell, Donald T., and Julian C. Stanley. 1963. Experimental and Quasi-Experimental Designs for Research.
Belmont, CA: Wadsworth.
Chen, Huey T., and Peter H. Rossi. 1983. “Evaluating with Sense: The Theory-Driven Approach.”
Evaluation Review 7 (3): 283–302.
Cho, Sung-Woo. 2016. “The Tension between Access and Success: Challenges and Opportunities for
Community Colleges in the United States.” In Widening Higher Education Participation: A Global
Perspective, ed. Mahsood Shah, Anna Bennett, and Erica Southgate. Oxfordshire, UK: Elsevier Ltd.
Chouinard, Jill, and Rodney Hopson. 2015. “Toward a More Critical Exploration of Culture in International
Development Evaluation.” Canadian Journal of Program Evaluation 30 (3): 248–76.
Christie, Christina A., and Marvin C. Alkin. 2013. “An Evaluation Theory Tree.” In Evaluation Roots:
Tracing Theorists’ Views and Influences, 2nd ed., ed. Marvin C. Alkin. Thousand Oaks, CA: SAGE
Publications, 11–57.
Christie, Christina A., Moira Inkelas, and Sebastian Lemire. 2017. “Improvement Science in Evaluation:
Methods and Uses.” New Directions for Evaluation: 153: 1–103.
Christie, Christina A., and Sebastian Lemire. 2019. “Why Evaluation Theory Should Be Used to Inform
Evaluation Policy.” American Journal of Evaluation 40 (4): 490–508. https​://doi.org/10.1177/10982​
14018​824045
Cody, Scott, and Andrew Asher. 2014. Proposal 14: Smarter, Better, Faster: The Potential for Predictive
Analytics and Rapid-Cycle Evaluation to Improve Program Development and Outcomes. http://www.
hamil​tonpr​oject.org/paper​s/predi​ctive_analy​tics_rapid-cycle_evalu​ation_impro​ve_progr​am_out-
comes. Accessed October 29, 2019.
Cousins, Brad J., Swee C. Goh, Catherine J. Elliott, and Isabelle Bourgeois. 2014. “Framing the Capacity to
Do and Use Evaluation.” New Directions for Evaluation 141: 7–23.
Donaldson, Stewart I., Christina A. Christie, and Melvin M. Mark. 2009. What Counts as Credible Evidence
in Applied Research and Evaluation Practice. Thousand Oaks, CA: Sage.
Fabra-Mata, Javier, and Jesper Mygind. 2019. “Big Data in Evaluation: Experiences from Using Twitter
Analysis to Evaluate Norway’s Contribution to the Peace Process in Colombia.” Evaluation 25 (1):
6–22.
Forss, Kim, Mita Marra, and Robert Schwartz. 2011. Evaluating the Complex: Attribution, Contribution, and
Beyond. New Brunswick, NJ: Transaction Publishers.
Friedman, Willa, Benjamin Woodman, and Minki Chatterji. 2015. “Can Mobile Phone Messages to Drug
Sellers Improve Treatment of Childhood Diarrhoea? A Randomized Controlled Trial in Ghana.”
Health Policy and Planning 30 (Suppl 1): i82–92.
Funnell, Sue C., and Patricia J. Rogers. 2011. Purposeful Program Theory: Effective Uses of Theories of Change
and Logic Models. San Francisco, CA: Jossey-Bass.
Comptroller General.1980. Federal Evaluations (1980 Congressional Sourcebook Series PAD-80-48).
Washington, DC: U.S. Government Printing Office.
Gervin, Derrick, Robin Kuwahara, Rashon Lane, Sarah Gill, Refilwe Moeti, and Maureen Wilce. 2014.
Practical Strategies for Culturally Competent Evaluation. Atlanta, GA: Centers for Disease Control and
Prevention, U.S. Department of Health and Human Services.
15410072, 2020, S1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/psj.12387 by Universitat Politecnica De Valencia, Wiley Online Library on [27/10/2022]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Lemire et al.: The Growth of the Evaluation Tree in the Policy Analysis ForestS69

Jensenius, Alexander R. 2012. Disciplinarities: Intra, Cross, Multi, Inter, Trans. Blog post. https​://www.arj.
no/2012/03/12/disci​plina​rities-2/. Accessed October 29, 2019.
Jones, Harry, and Simon Hearn. 2009. Outcome Mapping: A Realistic Alternative for Planning, Monitoring,
and Evaluation. https​://www.odi.org/sites/​odi.org.uk/files/​odi-asset​s/publi​catio​ns-opini​on-files/​
5058.pdf. Accessed October 29, 2019.
Kubisch, Anne C. 2010. “Recent History of Community Change Efforts in the United States” In Voices
from the Field III: Lessons and Challenges from Two Decades of Community Change Efforts, ed. Anne C.
Kubisch, Patricia Auspos, Prudence Brown, and Tom Dewar. Washington, DC: Aspen Institute.
Labin, Susan S., Jennifer L. Duffy, Duncan C. Meyers, Abraham Wandersman, and Catherine A. Lesesne.
2012. “A Research Synthesis of the Evaluation Capacity Building Literature.” American Journal of
Evaluation 33 (3): 307–38.
Lemire, Sebastian, and Gustav J. Petersson. 2017. “Big Bang or Big Bust? The Role and Implications of Big
Data in Evaluation.” In Cyber Society, Big Data, and Evaluation, ed. Gustav J. Petersson, and Jonathan
D. Breul. New Brunswick, NJ: Transaction Publishers, 215–36.
Meit, Michael, Carol Hafford, Catharine Fromknecht, Noelle Miesfeld, Tori Nadel, and Emily Phillips.
2017. Principles to Guide Research with Tribal Communities: The Tribal HPOG 2.0 Evaluation in Action.
OPRE Report OPRE 2017–61. Washington, DC: U.S. Department of Health and Human Services,
Administration for Children and Families, Office of Planning, Research, and Evaluation.
National Academies of Sciences, Engineering, and Medicine. 2017. Principles and Practices for Federal
Program Evaluation: Proceedings of a Workshop. Washington, DC: The National Academies Press.
Nielsen, Steffen B., Sebastian Lemire, and Majbrit Skov. 2011. “Measuring Evaluation Capacity—Results
and Implications of a Danish Study.” American Journal of Evaluation 32 (3): 324–44.
Patton, Michael Q. 1997. Utilization-Focused Evaluation: The New Century Text. Thousand Oaks, CA: Sage
Publications.
   . 2011. Developmental Evaluation—Applying Complexity Concepts to Enhance Use and Innovation. New
York, NY: Guilford Press.
Pattyn, Valérie, Astrid Molenveld, and Barbara Befani. 2017. “Qualitative Comparative Analysis as
an Evaluation Tool: Lessons from an Application in Development Settings.” American Journal of
Evaluation 40 (1): 55–74.
Pawson, Ray, and Nick Tilley. 1997. Realistic Evaluation. Thousand Oaks, CA: SAGE Publications.
Peck, Laura R. 2003. “Subgroup Analysis in Social Experiments: Measuring Program Impacts Based on
Post Treatment Choice.” American Journal of Evaluation 24 (2): 157–87.
   . 2013. “On Analysis of Symmetrically-Predicted Endogenous Subgroups: Part One of a Method
Note in Three Parts.” American Journal of Evaluation 34 (2): 225–36.
   . 2015. “Using Impact Evaluation Tools to Unpack the Black Box and Learn What Works.” Journal
of Multidisciplinary Evaluation 11 (24): 54–67.
   . 2018. “The Big Evaluation Enterprises in the United States.” New Directions for Evaluation 160
97–124.
   . 2020. Experimental Evaluation Design for Program Improvement. Thousand Oaks, CA: SAGE
Publications.
Peck, Laura R., and Lindsey M. Gorzalski. 2009. “An Evaluation Use Framework and Empirical
Assessment.” Journal of Multidisciplinary Evaluation 6 (12): 139–56.
Peck, Laura R., Yushim Kim, and Joanna Lucio. 2012. “An Empirical Examination of Validity in
Evaluation.” American Journal of Evaluation 33 (3): 350–65.
Preskill, Hallie. 1994. “Evaluation’s Role in Enhancing Organizational Learning: A Model for Practice.”
Evaluation and Program Planning 17 (3): 291–97.
Preskill, Hallie, and Shanelle Boyle. 2008. “A Multidisciplinary Model of Evaluation Capacity Building.”
American Journal of Evaluation 29 (4): 443–59.
Pressman, Jeffrey L., and Aaron Wildavsky. 1984. Implementation: How Great Expectations in Washington
Are Dashed in Oakland; Or, Why It’s Amazing that Federal Programs Work at All, This Being a Saga of the
Economic Development Administration as Told by Two Sympathetic Observers Who Seek to Build Morals on
a Foundation of Ruined Hopes: The Oakland Project, 3rd ed. Berkeley, CA: University of California Press.
15410072, 2020, S1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/psj.12387 by Universitat Politecnica De Valencia, Wiley Online Library on [27/10/2022]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
S70 Policy Studies Journal, 48:S1

Raftree, Linda, and Michael Bamberger. 2014. Emerging Opportunities: Monitoring and Evaluation in a Tech-
Enabled World. London, UK: ITAD and The Rockefeller Foundation. https​://assets.rocke​felle​rfoun​
dation.org/app/uploa​ds/20150​91112​2413/Monit​oring-and-Evalu​ation-in-a-Tech-Enabl​ed-World.
pdf. Accessed October 29, 2019.
Ragin, Charles C. 2014. The Comparative Method: Moving Beyond Qualitative and Quantitative Strategies.
Oakland, CA: University of California Press.
Schmitt, Johannes. Forthcoming. “Causal Mechanisms in Evaluation.” New Directions for Evaluation.
Scriven, Michael. 1991. Evaluation Thesaurus, 4th ed. Newbury Park, CA: Sage.
   . 2019. “The Checklist Imperative.” New Directions for Evaluation 163: 49–60.

Shadish, William R., Thomas D. Cook, and Laura C. Leviton. 1991. Foundations of Program Evaluation:
Theories of Practice. Newbury Park, CA: Sage.
Springer, J. Fred, Peter J. Haas, and Allan Porowski. 2017. Applied Policy Research: Concepts and Cases, 2nd
ed. New York, NY: Routledge.
Springer, J. Fred, and Allan Porowski. 2012. Natural Variation Logic and the DFC Contribution to Evidence-
Based Practice. Washington, DC: Presentation to the Society for Prevention Research.
Stember, Marilyn. 1991. “Advancing the Social Sciences through the Interdisciplinary Enterprise.” The
Social Science Journal 28 (1): 1–14.
U.S. Government Accountability Office. 1992. Evaluation Issues. Washington, DC: Comptroller General.
Vanderkruik, Rachel, and Marianne E. McPherson. 2017. “A Contextual Factors Framework to Inform
Implementation and Evaluation of Public Health Initiatives.” American Journal of Evaluation 38 (3):
348–59.
Weimer, David, and Aidan R. Vining. 2017. Policy Analysis: Concepts and Practice, 6th ed. New York, NY:
Routledge.
Weiss, Carol H. 1998. Evaluation. Upper Saddle River, NJ: Prentice-Hall.
Westbrook, T’ Pring R., Sarah A. Avellar, and Neil Seftor. 2017. “Reviewing the Reviews: Examining
Similarities and Differences Between Federally Funded Evidence Reviews.” Evaluation Review 41
(3): 183–211.
Williams, Bob, and Richard Hummelbrunner. 2011. System Concepts in Action: A Practitioner’s Toolkit.
Stanford, CA: Stanford University Press.
Wilson-Grau, Ricardo, and Heather Britt. 2012. Outcome Harvesting. Ford Foundation. https​://www.
outco​memap​ping.ca/downl​oad/wilso​ngrau_en_Outom​e%20Har​vesti​ng%20Bri​ef_revis​ed%20Nov​
%202013.pdf. Accessed October 29, 2019.

You might also like