Professional Documents
Culture Documents
Policy Studies Journal - 2020 - Lemire - The Growth of The Evaluation Tree in The Policy Analysis Forest Recent
Policy Studies Journal - 2020 - Lemire - The Growth of The Evaluation Tree in The Policy Analysis Forest Recent
See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Policy Studies Journal, Vol. 48, No. S1, 2020
The practice and profession of evaluation is continually evolving. From its early origin in the Great
Society years of the 1960s, through its golden years of the 1970s, its transformation under the fiscal
conservatism of the Reagan era in the 1980s, and in its maturation during the performance and results
era of the 1990s, the field of evaluation continues to evolve in response to broader trends in society. This
article examines recent developments and trends in the practice and profession of evaluation. Structured
around the evaluation theory tree, the presentation of these developments elaborates on the three main
branches of evaluation: methods, use, and valuing. The concluding discussion briefly addresses the
central role of evaluation—and other types of knowledge production—in providing actionable evidence
for use in public policy and program decision making.
KEY WORDS: evaluation, policy analysis, public policy, history of evaluation, evaluation theory tree
Introduction
S47
doi: 10.1111/psj.12387
© 2020 The Authors. Policy Studies Journal published by Wiley Periodicals, Inc. on behalf of Policy Studies Organization
This is an open access article under the terms of the Creative Commons Attribution-NonCommercial-NoDerivs License, which permits
use and distribution in any medium, provided the original work is properly cited, the use is non-commercial and no modifications or
adaptations are made.
15410072, 2020, S1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/psj.12387 by Universitat Politecnica De Valencia, Wiley Online Library on [27/10/2022]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
S48 Policy Studies Journal, 48:S1
readers know, policy analysis falls within the political science subfield of policy
studies, which addresses both the policy process and analysis activities within that
process. As Weimer and Vining (2017) define it, “policy analysis is client-oriented
advice relevant to public decisions informed by social values” (p. 30). Broader than
evaluation, policy analysis includes almost any analysis that aims to inform pol-
icy decisions. Evaluation is a particular type of policy analysis that uses systematic
data collection and analysis to determine the worth—often the formative or sum-
mative effectiveness—of a program or policy. Formative evaluation considers pro-
gram processes, and summative evaluation examines a program or policy’s ultimate
outcomes and impacts. The overlapping but distinct fields of evaluation and policy
analysis make the exercise of mapping the evolution and current state of evaluation
topical and relevant to program evaluators and policy analysts alike.
The article proceeds in three parts. The first part briefly describes the histor-
ical development of evaluation in relation to broader social, economic, political,
and technological developments. Informed by these observations, the second part
focuses on what we see as the most salient recent developments in the practice and
profession of evaluation. With the aim of providing a coherent presentation of these
diverse trends, we use the “evaluation theory tree” (Christie & Alkin, 2013) as an
organizing framework, categorizing the identified trends according to the tree’s
three main branches: methods, use, and valuing. Part three discusses implications of
our observed trends for policy analysis.
Before proceeding to this discussion, we note that the methodological foun-
dation of the article is primarily anecdotal, reflecting the combined experience,
scholarship, and curiosity of the authors as well as input from experienced col-
leagues. Collectively, as applied evaluation practitioners and scholars, we have over
60 years as active and committed members of the policy analysis and evaluation
communities. We have designed and implemented evaluations across varied policy
areas—including employment, welfare, education, housing, food assistance, mar-
ket development, and public health—and varied geographic regions, including the
United States, Afghanistan, Denmark, Ghana, and Nigeria. Over the course of our
many years of practice, we have held positions in academia, consulting manage-
ment firms, government research units, non-profit agencies, and for-profit research
organizations. Through a range of evaluation projects, conference presentations,
journal publications, guest editorials of special issues, hallway conversations, and
book authorship (e.g., Peck, 2020; Springer, Haas, & Porowski, 2017), we have also
had the opportunity to engage with a broad range of evaluation practitioners and
commissioners. The topics and trends we highlight in the following pages reflect
these engagements and experiences within the evaluation and policy analysis fields.
about the program, improve program effectiveness, and/or inform decisions about
future programming” (p. 23). This definition speaks directly to three defining char-
acteristics of evaluation: the use of systematic data collection (methods), the central
role of making judgments about programs and policies (valuing), and the central
purpose of informing decision making (use). We explore and expand upon these
three aspects of evaluation later in this article. As we show, methods, valuing, and
use have been fundamentally influenced and shaped by evaluation’s embedded role
in knowledge production for policy and program decision making.
Throughout its history, evaluation has been intrinsically connected to public
policy, policymaking, and policy analysis. The public policy–evaluation connec-
tion originated in the 1950s, with the birth of modern evaluation. Within the federal
government, increased attention to operations research, primarily carried out by
planning and systems analysts, economists, budget specialists, and internal audi-
tors, planted the seeds for modern-day evaluation (Weiss, 1998). During this time
period—and fueled (at least in part) by the race against the rapidly developing space
program of the Soviet Union—evaluation primarily centered on the assessment of
educational outcomes of American youth. With the goal of boosting scientific lit-
eracy, the federal government sponsored a broad range of curriculum efforts (e.g.,
the Harvard Project Physics), and funding for evaluation of these initiatives soon
followed (Weiss, 1998). The seeds of evaluation are deeply rooted in operations
research and assessment supporting the federal government.
These early seeds found a nurturing environment in the Great Society years
(roughly from the early 1960s to the mid-1970s), a period characterized by unprec-
edented growth in the size and reach of federal programs. Under the banner of the
War on Poverty, federal investments in public aid programs in education, housing,
and workforce development, for example, rose from $23.5 billion in 1950 to $428.3
billion in 1979, an increase of 600 percent in constant dollars (Bell, 1983, cited in
Shadish, Cook, & Leviton, 1991). As Shadish et al. (1991) observe, “most programs
were launched with high hopes, great dispatch, and enormous financial invest-
ments” (p. 22). In tandem with these federal investments, the demand for evidence
of the effectiveness of social programs and policies quickly arose from the public
and from legislators—social accountability became pressing. During this time, the
federal government’s demand for evaluation services continued to rise. Evaluation
was becoming a growth industry (Shadish et al., 1991).
The demand for evaluation generally concerned large-scale, multi-year demon-
stration projects, focusing primarily on value for investment: the extent to which
federally funded programs were making a difference, and at what cost. As described
in Campbell’s (1969) seminal work on reforms as experiments, three mechanisms com-
prised the underlying logic of using evaluation for social programing: (1) generate
program variations, (2) select the variant that reduces the problem the most (multi-
arm outcome evaluation), and (3) retain knowledge about effective programs. In line
with this thinking, and informed by Campbell and Stanley’s (1963) work on exper-
imental designs, the research design of choice was the experimental design and to
some extent quasi-experimental designs, both kinds of evaluation ideal for answer-
ing questions about program impacts. The evaluations of the Juvenile Delinquency
15410072, 2020, S1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/psj.12387 by Universitat Politecnica De Valencia, Wiley Online Library on [27/10/2022]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
S50 Policy Studies Journal, 48:S1
Program (1962), the Manpower Development and Training Act of 1962, the anti-pov-
erty programs under the Economic Opportunity Act of 1964, and the Title I educa-
tional evaluation requirement under the Elementary and Secondary Education Act
(1965) are a few of the often-cited landmark evaluations of this era. Evaluation was
largely driven by legislative demand.
In response to this demand, the internal evaluation capacity of the federal
government was found lacking. While internal evaluation staff and even evalua-
tion units were beginning to emerge, the scope and size of the evaluations needed
reached beyond the evaluation capacity within the federal government. Moreover,
the need for “independent” and “objective” evaluation of government programs
further motivated contracting out evaluation work. In response, research organi-
zations such as Abt Associates, Mathematica, and MDRC, among others, emerged
in the mid-1960s, specializing in designing and implementing high-quality, rigor-
ous social experiments of large-scale, multi-year demonstration projects (e.g., Peck,
2018). Based on a survey across federal agencies, the General Accounting Office
(GAO; now the Government Accountability Office) identified 5,610 completed fed-
eral evaluations from 1973 to 1979, averaging 935 completed evaluations per year
(Comptroller General, 1980). The trunk of an evaluation tree was formed.
The first pruning of evaluation came in the early 1980s with the emergence
of fiscal conservatism, the roll-back of federal spending on social programs, and
the concomitant roll-back of evaluation spending. The political environment had
shifted, and that shift’s impact on the demand for evaluation services was immedi-
ate. Federal funds earmarked for evaluation decreased by 37 percent between 1980
and 1984 (Shadish et al., 1991, p. 27). The reduction was not solely due to fiscal
conservatism. Although previous decades had great hopes for evaluation’s abil-
ity to determine what works, the early demonstration programs often delivered
mixed findings or even null impacts (Pressman & Wildavsky, 1984; Weiss, 1998).
Accordingly, a growing skepticism about the effectiveness of social programs (as
well as the ability of evaluation to identify effective programs) contributed to the
widespread cutbacks in federal expenditures for evaluation (Shadish et al., 1991).
Interestingly, while the overall funding for evaluation was reduced, the number of
federally funded evaluations remained more or less the same; many of the evalu-
ations were now smaller and with an internal focus, with fewer larger multi-year
evaluations being contracted out (Shadish et al., 1991, p. 27). Accordingly, a larger
number of evaluations took place in house, a trend made possible by the accumula-
tion of internal evaluation capacity within the federal government over the preced-
ing decades.
Toward the end of the 1980s and into the early 1990s, federal government inter-
est in evaluation increased. In a series of reports titled Program Evaluation Issues, the
U.S. GAO (1992) expressed concern about the reduction in the nation’s ability to
support evaluation. Echoing earlier calls for social accountability, the GAO asserted
the importance of evaluation for the federal government by rhetorically asking:
“Are the federal officials who administer programs adequately informed about the
implementation and the results of those investments?” (1992, p. 4). Again, the main
stimulus for evaluation came through legislative action, ushering in a new era of
15410072, 2020, S1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/psj.12387 by Universitat Politecnica De Valencia, Wiley Online Library on [27/10/2022]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Lemire et al.: The Growth of the Evaluation Tree in the Policy Analysis ForestS51
evaluation and governance. The Government Performance and Results Act (GPRA)
of 1993, one of a series of laws designed to improve government performance while
reducing the budget deficit, signified if not a novel shift then at least a renewed com-
mitment to government-wide performance measurement and management, moni-
toring outcomes over outputs, and managing toward agency- or government-wide
goals. Under GPRA, all federal departments and divisions were required to estab-
lish performance targets, monitor performance against those targets, and report on
their progress (or lack thereof). “Progress” was gauged in terms of achieving target
outcomes as opposed to generating impacts, both of which are a focus of evaluation
work.1
A concurrent development, and one that revisited and revitalized the
“what works” question about impacts of the Great Society era, was the push for
evidence-based practice and policy: the idea that policies and programs should be
grounded in evidence, as opposed to values, ideology, fads, and other persuasions.
Evidence-based policy reform assumes, of course, an evidence base. Building on the
growing number of experimental and high-quality quasi-experimental evaluations,
the intended evidence base was accumulated in government-funded clearinghouses
established in the early 2000s, such as the What Works Clearinghouse, a flagship
initiative for evidence-based practice in education. Other well-known repositories
of evidence include the Best Evidence Encyclopedia, the Campbell Collaboration,
the Clearinghouse for Labor Evaluation and Research, and the California Evidence-
Based Clearinghouse for Child Welfare. The number of clearinghouses and other
repositories of evidence continues to grow (Westbrook, Avellar, & Seftor, 2017).
During the early 2000s, the demand for evaluation was further reinforced by a
number of legislative acts and initiatives in the federal government. The Information
Quality Act of 2001 and the Office of Management and Budget’s Guidelines
for Ensuring and Maximizing the Quality, Objectivity, Utility, and Integrity of
Information Disseminated by Federal Agencies both advise agencies on assuring
data quality (from collection through dissemination) as well as on developing the
appropriate administrative mechanisms to maintain these data standards. In 2002, to
strengthen GPRA, the Office of Management and Budget also created the Program
Assessment Rating Tool—a survey covering program purpose and design, strategic
planning, program management, and program results. In 2003 (then in a refreshed
version in 2007), the U.S. Department of Education, Institute for Education Sciences
issued a statement on Scientifically Based Evaluation Methods that emphasized the
use of experimental and quasi-experimental research designs that allow estimation
of least-biased, causal program impacts. Then, in 2010, GPRA was reaffirmed, and
expanded, in the form of the Government Performance and Results Modernization
Act. More recently, the bipartisan establishment of the U.S. Commission on Evidence-
Based Policymaking, charged with examining how government can make better use
of its existing data to provide evidence for future government decisions, signifies
the sustained integration of evaluation in government decision-making about social
programs and policies. Considered collectively, these legislative mandates, many of
which are bipartisan, speak directly to the federal government’s sustained interest
and investment in evidence-informed decision making.
15410072, 2020, S1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/psj.12387 by Universitat Politecnica De Valencia, Wiley Online Library on [27/10/2022]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
S52 Policy Studies Journal, 48:S1
With the early growth of the evaluation tree described, we begin to see that map-
ping the diverse and evolving landscape of evaluation is a nontrivial task. The reach
of evaluation, the boundaries of our practice, covers a broad range of sectors; an
even broader range of policies, programs, and projects; and a still broadening range
of themes and topics. After all, and as Shadish and colleagues (1991) dryly note, “we
can evaluate anything—including evaluation itself” (p. 19). From this perspective,
and true to the metaphor of flora and fauna, we will need a classification system, a
systema natura in Linnaean terms, to capture and make sense of the blooming field of
evaluation wildflowers, weeds and all.
The evaluation theory tree (Christie & Alkin, 2013) serves well as an organiz-
ing framework for reflecting on recent developments in evaluation. The tree has
become a well-known organizing framework for describing and discussing the eval-
uation theory and practice in evaluation circles. Accordingly, it offers a recognized
framework for those already familiar with the field of evaluation. For those new
to the field (and its literature), which would likely include at least some readers of
the Policy Studies Journal, the evaluation theory tree offers an accessible organizing
framework for understanding three central dimensions of the evaluation theory and
practice.
The original evaluation theory tree depicts evaluation theorists and scholars
by name. For our purpose of mapping the recent trends in evaluation, we inten-
tionally modify the tree to focus on the themes and approaches that characterize
the branches. The result appears in Figure 1. Following Christie and Alkin (2013),
the roots of the evaluation theory tree—the intellectual foundations from which
evaluation emerged—are described as social accountability (the demand for social
programs to demonstrate positive outcomes or impact), systematic social inquiry
(the commitment to systematic use of social scientific methods), and epistemology
(considerations about the nature and validity of knowledge). As Christie and Alkin
(2013) note, each of these roots has motivated, influenced, and provided a foun-
dational rationale for the conceptualization and development of the practice and
profession of evaluation.
We discuss the main, central branch of the theory tree first: methods. Growing
directly from the social inquiry root, the methods branch concerns research meth-
odology, including designs, methods, approaches, and techniques for conducting
evaluations. Historically, the branch grew from the work of Campbell and Stanley
(1963) on experimental and quasi-experimental designs, and centered on the valid-
ity threats and solutions associated with these designs. Since then, the branch has
grown and diversified to include a broader range of tools (program theory), evalua-
tion approaches (theory-based evaluation), research designs (covering experimental
and quasi-experimental designs, and including systematic reviews), data collection
methods, and analytical techniques, spanning across the quantitative and qualita-
tive traditions.
The valuing branch concerns the ways in which evaluators determine the
“merit” and “worth” of a program or policy as part of an evaluation (Scriven, 1991,
15410072, 2020, S1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/psj.12387 by Universitat Politecnica De Valencia, Wiley Online Library on [27/10/2022]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Lemire et al.: The Growth of the Evaluation Tree in the Policy Analysis ForestS53
The remaining branch on the evaluation tree considers evaluation use, which,
as an area of inquiry, involves identifying what factors are associated with use of
an evaluation. Historically, the use of evaluations has always been a central con-
cern in the practice and profession of evaluation. As described by Alkin and King
(2016), over the years evaluators have cycled through various conceptualizations
of “use,” including instrumental, conceptual, and symbolic use. Most prominently,
instrumental use refers to the direct use of evaluation findings for decision making.
Conceptual use or enlightenment implies that an evaluation is used “to change lev-
els of knowledge, understanding, and attitude” (Peck & Gorzalski, 2009, p. 141).
Symbolic use refers to an evaluation being used “to convince others of a political
position” (Peck & Gorzalski, 2009, p. 141). Currently, a prominent concept of use is
that of process use, which refers to the potential utility of stakeholders’ participa-
tion in the evaluation process. In extending these conceptualizations of use, a broad
range of participatory and utilization-focused evaluation approaches—which all
focus on promoting use—are associated with this branch (e.g., Patton, 2011).
As this discussion reveals, there is some overlap across the three main branches
of the evaluation tree. This overlap is necessary given the integrated nature of meth-
ods, valuing, and use in evaluation practice. As we note later, several of the identi-
fied trends discussed below could have been placed on more than one branch (for
example, see the description below of collaborative evaluation approaches under
the use branch, which also has important methods aspects). In locating the identified
trends on the evaluation tree, we have decided to follow in the footsteps of Christie
and Alkin (2013), placing the trends in line with their primary emphasis.
Having defined the three branches, we turn to discussing recent trends that align
with each. The identified trends are not equally relevant or applicable across the
many sectors and areas within the broad scope of evaluation. This is to be expected
given the diverse nature of evaluation as a field of practice. Accordingly, the trends
may apply to different degrees across the broad field of evaluation. Considered col-
lectively, however, the identified trends are fundamental to the future development
of the practice and profession of evaluation.
Of the three branches on the evaluation tree, and in addition to being the main,
central branch, the methods branch is also the fastest growing. This is not surpris-
ing given the prominent role of methods in the day-to-day practice of evaluation:
evaluators are, after all, a pragmatic bunch of practitioners. Reflecting on recent de-
velopments on the methods branch, we identify three prominent trends: (1) big data
analytics, (2) understanding how and why programs work, and (3) complexity theory
and systems thinking.
Methods Trend 1: Big Data Analytics. A prominent development on the methods branch, and
one that will likely grow and expand for decades to come, is the emergence of big data
analytics (Bamberger, 2016; Raftree & Bamberger, 2014). Big data refers to new data
sources (e.g., tweets, data warehouses managed by corporations and state entities) as
15410072, 2020, S1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/psj.12387 by Universitat Politecnica De Valencia, Wiley Online Library on [27/10/2022]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Lemire et al.: The Growth of the Evaluation Tree in the Policy Analysis ForestS55
A basic purpose of evaluation is its use (e.g., in the policy process, for program
operations). Current emphasis on use centers on two prominent trends: (1) evalu-
ation capacity building and (2) collaborative/participatory evaluation approaches.
Use Trend 1: Evaluation Capacity Building. Evaluation capacity refers to the organizational
capacity to conduct and use evaluations (Cousins, Goh, Elliott, & Bourgeois, 2014).
This capacity may be in the form of evaluation staff and units; guidelines, policies, and
procedures for when and how to conduct evaluations; and systems and processes
for disseminating and making use of evaluations within the organization. Relatedly,
evaluation capacity building refers to the strategies by which evaluation capacity is
developed and maintained in organizations. The institutionalization of evaluation
withinorganizationscanbetracedbacktotheestablishmentofevaluationunitsinthefederal
government in the 1970s (as mentioned earlier); however, the recent push for evaluation
capacity building signifies another generation of institutionalization of evaluation, a
manifestation of the integration of evaluation in the fabric of organizations. The growing
scholarship on this topic covers various theoretical conceptualizations of evaluation
capacity (Bourgeois & Cousins, 2013), measurement models and validations of these
(Nielsen, Lemire, & Skov, 2011), strategies and approaches for building evaluation
capacity (Preskill & Boyle, 2008), and syntheses of the existing literature (Labin, Duffy,
Meyers, Wandersman, & Lesesne, 2012). Evidence of evaluation capacity building in
15410072, 2020, S1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/psj.12387 by Universitat Politecnica De Valencia, Wiley Online Library on [27/10/2022]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Lemire et al.: The Growth of the Evaluation Tree in the Policy Analysis ForestS59
practice can be seen in many of the Obama Administration’s evidence-based social policy
initiatives, such as the Investing in Innovation Fund (i3), Social Innovation Fund, and
Workforce Innovation Fund, which included both a requirement for prior evidence as a
condition of the grant and the conduct of a rigorous impact evaluation supported by a
federally funded technical assistance provider, where that external provider works with
evaluators to enhance the quality of their work.
Increasing evaluation capacity is also being driven by the development and
establishment of evaluation policies within federal government agencies, founda-
tions, and NGOs, among others. Evaluation policies are important in no small part
because they provide “guidance on how, when, and with what purpose evalua-
tions are carried out within an organization, that is, within a specific organizational,
cultural, and political context” (Christie & Lemire, 2019).3 In this way, evaluation
policies serve as mechanisms to integrate evaluation practice and help build true
learning organizations (Preskill, 1994).
Another trend emerging from evaluation capacity building is that of evaluative
thinking, which can be defined as:
First defining the evaluative thinking and then providing support for key person-
nel in varied organizational settings to embrace that thinking holds the potential to
increase the evaluation capacity. That said, the current focus on evaluative thinking
is primarily conceptual. In practice, evaluators have made only limited attempts—
with even fewer applied examples—to demonstrate evaluative thinking in practice.
This observation is not raised as a critique; boundary probing is often a necessary
and useful first step when introducing a novel concept into a field of scholarship or
practice.
We find the surge of interest in evaluation capacity building and evaluative
thinking important for several reasons. This trend signifies a central commitment
to promoting evaluation in society. To do so involves socializing evidence building,
building the capacity to conduct evaluations, and institutionalizing the use of eval-
uation findings in practice. Moreover, the demand for evaluation capacity—within
federal government agencies, foundations, and NGOs, among others—indicates a
sustained interest in and commitment to using evidence. As organizations become
“learning” entities that value the evaluative thinking, the use of evaluation is more
likely to be embedded into decision-making procedures.
Use Trend 2: Collaborative/Participatory Evaluation Approaches. A second trend on the use
branch—with strong methods branch connections—is that of the expansion of
collaborative and participatory approaches to evaluation. Participatory evaluation
15410072, 2020, S1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/psj.12387 by Universitat Politecnica De Valencia, Wiley Online Library on [27/10/2022]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
S60 Policy Studies Journal, 48:S1
recognize that these two models for evaluation—participatory versus objective tech-
nician—are the source of some tension within the field. To us, both models seem
appropriate in certain circumstances. For example, if the evaluation is being con-
ducted (and perhaps paid for by) the program itself, then the participatory model
seems fitting. Moreover, if the evaluation is being conducted for or by an outsider (a
current or potential future funder), then the participatory model seems inappropri-
ate. There are settings and situations where engaging research subjects in an evalu-
ation might jeopardize the quality of an evaluation; and similarly there are settings
and situations where not engaging research subjects might do so. Collecting data
from among very vulnerable populations (such as the homeless), for instance, can
be made much more effective by including those individuals in the data collection
design (here, again, is a connection to the valuing branch, which we discuss next).
In brief, it is worth considering the trade-offs and choosing the right model for the
questions and associated evaluation at hand.
The most pronounced trend in the valuing branch is the growing attention to
cultural awareness and cultural competence in evaluation practice.
Valuing Trend: Cultural Awareness in Evaluation Practice. A major area of scholarly and
practical discussion centers on how culture is conceptualized and defined in making
evaluative judgments and how evaluation practitioners, scholars, and/or policymakers
consider cultural context in practice (e.g., Chouinard & Hopson, 2015). Cultural
awareness is fundamental, especially in many evaluation contexts, including
international development evaluation and evaluation in indigenous communities,
where the dominance of Western epistemological approaches to knowledge
production is particularly problematic.
One notable example of mainstreaming cultural awareness in evaluation is
visible in the American Evaluation Association’s Guiding Principles for Evaluators.
Revised in 2018, the principles now emphasize cultural competence as a core ele-
ment of professional practice. According to the standards, “culturally competent”
evaluators are defined as those “who draw upon a wide range of evaluation theories
and methods to design and carry out an evaluation that is optimally matched to the
context. In constructing a model or theory of how the evaluation operates, the evalu-
ator reflects the diverse values and perspectives of key stakeholder groups, for good
practice” (American Evaluation Association, 2018, p. 3).
The growing integration of cultural competence in evaluation is evident in a
Centers for Disease Control and Prevention (Gervin et al., 2014) guide on practical
strategies for cultural competence in evaluation. These strategies include emphasiz-
ing the need to surface and examine cultural biases and assumptions at every stage
of the evaluation; the importance of reflecting on how professional and personal
backgrounds may influence our analyses and interpretations of data; and the ethical
obligation to consider the influence of these when drawing evaluative conclusions.
15410072, 2020, S1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/psj.12387 by Universitat Politecnica De Valencia, Wiley Online Library on [27/10/2022]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
S62 Policy Studies Journal, 48:S1
The purpose of this article has been to consider recent trends in evaluation and
to situate these trends within the broader field of policy analysis. As our explora-
tion of the evaluation tree in the preceding pages illustrates, the development of the
practice and profession of evaluation has in fundamental ways been influenced by
changing needs and demands for evidence in society at large and the federal, state,
and local government more specifically.
15410072, 2020, S1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/psj.12387 by Universitat Politecnica De Valencia, Wiley Online Library on [27/10/2022]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Lemire et al.: The Growth of the Evaluation Tree in the Policy Analysis ForestS63
This directs our attention to the central role of evaluation—and other types of
knowledge production—in providing actionable evidence for use in public policy
and program decision making. This actionable evidence role affects every aspect
of evaluation: the methods by which evidence about social programs and policies
is derived (the types of evidence called for), the valuing of the merit and worth of
social policies and programs (the standards and criteria used), and the potential use
(or misuse) of these judgments as reflective of and responsive to policy and program
decision making. From this perspective, trends in evaluation—along with trends in
any other type of knowledge production—should be viewed within broader trends
in knowledge production of actionable evidence for political decision-making
processes.
In reflecting on the six identified trends and what they tell us about develop-
ments in knowledge production for policy decision making, this concluding discus-
sion makes three observations:
These are observations that we see reflected in the identified trends in evalu-
ation. The observations also cut across and reach beyond the individual branches
of the evaluation tree and consider their location in the policy analysis forest. We
consider these observations in turn.
works, to surface the inner workings of the program, then a theory-oriented evalua-
tion may be especially appropriate and credible for that purpose.
Our third observation is that the problems of our world cannot be addressed by a
single discipline. Consider climate change, large-scale human migration, and global
inequality: to solve any of these challenges, we need to defy disciplinary boundar-
ies. Evaluation requires the work of methodologists, recruitment specialists, data
collectors, statisticians, economists, writers, and communications specialists—and
often in small evaluation shops, these roles must be filled by a single person (Peck,
2018). The academic disciplines from which evaluators come to take on these varied
roles includes anthropology, economics, political science, psychology, and sociology.
These disciplines bring their own lenses and tools that not only facilitate evaluation
work generally but also need to come together if we are to take on society’s biggest,
most vexing challenges.
15410072, 2020, S1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/psj.12387 by Universitat Politecnica De Valencia, Wiley Online Library on [27/10/2022]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
S66 Policy Studies Journal, 48:S1
• Multidisciplinary (i.e., people from varied disciplines working with one another);
• Cross-disciplinary (i.e., viewing one discipline from the perspective of the other);
• Interdisciplinary (i.e., integrating knowledge and methods from varied systems); and/or
• Transdisciplinary (i.e., creating a unity of methods and frameworks above and beyond the
individual disciplines).
Notes
The authors are grateful for the generous feedback provided by the many colleagues at Abt Associates
who read and commented on earlier versions of the article. These include John Hitchcock and Christopher
Weiss along with several participants in Abt’s Work in Progress Seminar. Most notably, Jacob Klerman
offered thoughtful feedback and guidance throughout the writing process to improve every aspect of the
article, and we sincerely thank him. The article also benefitted from constructive input from two anony-
mous reviewers. Any omissions and errors are the authors’.
1. GPRA focused on outcomes, which are a focus of some evaluation research. That said, summative eval-
uations instead focus on impacts, which are the difference in outcomes that are attributable to the program
or policy being evaluated. Barnow and Smith (2004) argued that GPRA’s incentives—at least in the
job training arena—served to reduce federal government focus on impacts by placing its emphasis on
performance management outcomes.
2. As noted by one of the reviewers, the meaning of the term valuing in the context of evaluation is dis-
tinct from the use of the term in the context of cost-benefit analyses, where the ratio of costs and out-
comes of an intervention are translated (and valued) in monetary terms. In the context of evaluation,
valuing refers to the act of attributing a value to a program. This act of valuing may of course involve
(but is by no means limited to) cost-benefit analyses (and outcomes expressed in monetary terms).
3. One such example is that of the U.S. Department of Health and Human Services, Administration for
Children and Families’ Evaluation Policy, available at https://www.acf.hhs.gov/opre/resource/acf-
evaluation-policy.
References
Abt Associates. 2017. “Abt Associates to Tackle Vector-Borne Diseases Worldwide.” News Post. https://
www.abtassociates.com/who-we-are/news/news-releases/abt-associates-to-tackle-vector-borne-
diseases-worldwide-wins-major. Accessed October 29, 2019.
Alkin, Marvin C., and Jean A. King. 2016. “The Historical Development of Evaluation Use.” American
Journal of Evaluation 37 (4): 568–79.
American Evaluation Association. 2018. The 2018 American Evaluation Association Guiding Principles for
Evaluators. https://www.eval.org/p/cm/ld/fxml:id=51. Accessed October 29, 2019.
Bamberger, Michael. 2016. Integrating Big Data into the Monitoring and Evaluation of Development Programmes.
http://unglobalpulse.org/sites/default/files/IntegratingBigData_intoMEDP_web_UNGP.pdf.
Accessed October 29, 2019.
Barnow, Burt S., and Jeffrey A. Smith. 2004. “Performance Management of US Job Training Programs:
Lessons from the Job Training Partnership Act.” Public Finance & Management 4 (3): 247–87.
Beach, Derek, and Rasmus B. Pedersen. 2013. Process Tracing Methods: Foundations and Guidelines. Ann
Arbor, MI: The University of Michigan Press.
15410072, 2020, S1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/psj.12387 by Universitat Politecnica De Valencia, Wiley Online Library on [27/10/2022]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
S68 Policy Studies Journal, 48:S1
Bell, Stephen H., and Laura R. Peck. 2016. “On the ‘How’ of Social Experiments: Experimental Designs for
Getting Inside the Black Box.” New Directions for Evaluation 152: 97–107.
Bell, Winifred. 1983. Contemporary Social Welfare. New York, NY: MacMillan.
Bloom, Howard S., Carolyn J. Hill, and James A. Riccio. 2003. “Linking Program Implementation and
Effectiveness: Lessons from a Pooled Sample of Welfare-to-Work Experiments.” Journal of Policy
Analysis and Management 22 (4): 551–75.
Bourgeois, Isabelle, and J. Bradley Cousins. 2013. “Understanding Dimensions of Organizational
Evaluation Capacity.”American Journal of Evaluation 34 (3): 299–319.
Buckley, Jane, Thomas Archibald, Monica Hargraves, and William M. Trochim. 2015. “Defining and
Teaching Evaluative Thinking: Insights from Research on Critical Thinking.” American Journal of
Evaluation 36 (3): 375–88.
Campbell, Donald T. 1969. “Reforms as Experiments.” American Psychologist 24: 409–29.
Campbell, Donald T., and Julian C. Stanley. 1963. Experimental and Quasi-Experimental Designs for Research.
Belmont, CA: Wadsworth.
Chen, Huey T., and Peter H. Rossi. 1983. “Evaluating with Sense: The Theory-Driven Approach.”
Evaluation Review 7 (3): 283–302.
Cho, Sung-Woo. 2016. “The Tension between Access and Success: Challenges and Opportunities for
Community Colleges in the United States.” In Widening Higher Education Participation: A Global
Perspective, ed. Mahsood Shah, Anna Bennett, and Erica Southgate. Oxfordshire, UK: Elsevier Ltd.
Chouinard, Jill, and Rodney Hopson. 2015. “Toward a More Critical Exploration of Culture in International
Development Evaluation.” Canadian Journal of Program Evaluation 30 (3): 248–76.
Christie, Christina A., and Marvin C. Alkin. 2013. “An Evaluation Theory Tree.” In Evaluation Roots:
Tracing Theorists’ Views and Influences, 2nd ed., ed. Marvin C. Alkin. Thousand Oaks, CA: SAGE
Publications, 11–57.
Christie, Christina A., Moira Inkelas, and Sebastian Lemire. 2017. “Improvement Science in Evaluation:
Methods and Uses.” New Directions for Evaluation: 153: 1–103.
Christie, Christina A., and Sebastian Lemire. 2019. “Why Evaluation Theory Should Be Used to Inform
Evaluation Policy.” American Journal of Evaluation 40 (4): 490–508. https://doi.org/10.1177/10982
14018824045
Cody, Scott, and Andrew Asher. 2014. Proposal 14: Smarter, Better, Faster: The Potential for Predictive
Analytics and Rapid-Cycle Evaluation to Improve Program Development and Outcomes. http://www.
hamiltonproject.org/papers/predictive_analytics_rapid-cycle_evaluation_improve_program_out-
comes. Accessed October 29, 2019.
Cousins, Brad J., Swee C. Goh, Catherine J. Elliott, and Isabelle Bourgeois. 2014. “Framing the Capacity to
Do and Use Evaluation.” New Directions for Evaluation 141: 7–23.
Donaldson, Stewart I., Christina A. Christie, and Melvin M. Mark. 2009. What Counts as Credible Evidence
in Applied Research and Evaluation Practice. Thousand Oaks, CA: Sage.
Fabra-Mata, Javier, and Jesper Mygind. 2019. “Big Data in Evaluation: Experiences from Using Twitter
Analysis to Evaluate Norway’s Contribution to the Peace Process in Colombia.” Evaluation 25 (1):
6–22.
Forss, Kim, Mita Marra, and Robert Schwartz. 2011. Evaluating the Complex: Attribution, Contribution, and
Beyond. New Brunswick, NJ: Transaction Publishers.
Friedman, Willa, Benjamin Woodman, and Minki Chatterji. 2015. “Can Mobile Phone Messages to Drug
Sellers Improve Treatment of Childhood Diarrhoea? A Randomized Controlled Trial in Ghana.”
Health Policy and Planning 30 (Suppl 1): i82–92.
Funnell, Sue C., and Patricia J. Rogers. 2011. Purposeful Program Theory: Effective Uses of Theories of Change
and Logic Models. San Francisco, CA: Jossey-Bass.
Comptroller General.1980. Federal Evaluations (1980 Congressional Sourcebook Series PAD-80-48).
Washington, DC: U.S. Government Printing Office.
Gervin, Derrick, Robin Kuwahara, Rashon Lane, Sarah Gill, Refilwe Moeti, and Maureen Wilce. 2014.
Practical Strategies for Culturally Competent Evaluation. Atlanta, GA: Centers for Disease Control and
Prevention, U.S. Department of Health and Human Services.
15410072, 2020, S1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/psj.12387 by Universitat Politecnica De Valencia, Wiley Online Library on [27/10/2022]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Lemire et al.: The Growth of the Evaluation Tree in the Policy Analysis ForestS69
Jensenius, Alexander R. 2012. Disciplinarities: Intra, Cross, Multi, Inter, Trans. Blog post. https://www.arj.
no/2012/03/12/disciplinarities-2/. Accessed October 29, 2019.
Jones, Harry, and Simon Hearn. 2009. Outcome Mapping: A Realistic Alternative for Planning, Monitoring,
and Evaluation. https://www.odi.org/sites/odi.org.uk/files/odi-assets/publications-opinion-files/
5058.pdf. Accessed October 29, 2019.
Kubisch, Anne C. 2010. “Recent History of Community Change Efforts in the United States” In Voices
from the Field III: Lessons and Challenges from Two Decades of Community Change Efforts, ed. Anne C.
Kubisch, Patricia Auspos, Prudence Brown, and Tom Dewar. Washington, DC: Aspen Institute.
Labin, Susan S., Jennifer L. Duffy, Duncan C. Meyers, Abraham Wandersman, and Catherine A. Lesesne.
2012. “A Research Synthesis of the Evaluation Capacity Building Literature.” American Journal of
Evaluation 33 (3): 307–38.
Lemire, Sebastian, and Gustav J. Petersson. 2017. “Big Bang or Big Bust? The Role and Implications of Big
Data in Evaluation.” In Cyber Society, Big Data, and Evaluation, ed. Gustav J. Petersson, and Jonathan
D. Breul. New Brunswick, NJ: Transaction Publishers, 215–36.
Meit, Michael, Carol Hafford, Catharine Fromknecht, Noelle Miesfeld, Tori Nadel, and Emily Phillips.
2017. Principles to Guide Research with Tribal Communities: The Tribal HPOG 2.0 Evaluation in Action.
OPRE Report OPRE 2017–61. Washington, DC: U.S. Department of Health and Human Services,
Administration for Children and Families, Office of Planning, Research, and Evaluation.
National Academies of Sciences, Engineering, and Medicine. 2017. Principles and Practices for Federal
Program Evaluation: Proceedings of a Workshop. Washington, DC: The National Academies Press.
Nielsen, Steffen B., Sebastian Lemire, and Majbrit Skov. 2011. “Measuring Evaluation Capacity—Results
and Implications of a Danish Study.” American Journal of Evaluation 32 (3): 324–44.
Patton, Michael Q. 1997. Utilization-Focused Evaluation: The New Century Text. Thousand Oaks, CA: Sage
Publications.
. 2011. Developmental Evaluation—Applying Complexity Concepts to Enhance Use and Innovation. New
York, NY: Guilford Press.
Pattyn, Valérie, Astrid Molenveld, and Barbara Befani. 2017. “Qualitative Comparative Analysis as
an Evaluation Tool: Lessons from an Application in Development Settings.” American Journal of
Evaluation 40 (1): 55–74.
Pawson, Ray, and Nick Tilley. 1997. Realistic Evaluation. Thousand Oaks, CA: SAGE Publications.
Peck, Laura R. 2003. “Subgroup Analysis in Social Experiments: Measuring Program Impacts Based on
Post Treatment Choice.” American Journal of Evaluation 24 (2): 157–87.
. 2013. “On Analysis of Symmetrically-Predicted Endogenous Subgroups: Part One of a Method
Note in Three Parts.” American Journal of Evaluation 34 (2): 225–36.
. 2015. “Using Impact Evaluation Tools to Unpack the Black Box and Learn What Works.” Journal
of Multidisciplinary Evaluation 11 (24): 54–67.
. 2018. “The Big Evaluation Enterprises in the United States.” New Directions for Evaluation 160
97–124.
. 2020. Experimental Evaluation Design for Program Improvement. Thousand Oaks, CA: SAGE
Publications.
Peck, Laura R., and Lindsey M. Gorzalski. 2009. “An Evaluation Use Framework and Empirical
Assessment.” Journal of Multidisciplinary Evaluation 6 (12): 139–56.
Peck, Laura R., Yushim Kim, and Joanna Lucio. 2012. “An Empirical Examination of Validity in
Evaluation.” American Journal of Evaluation 33 (3): 350–65.
Preskill, Hallie. 1994. “Evaluation’s Role in Enhancing Organizational Learning: A Model for Practice.”
Evaluation and Program Planning 17 (3): 291–97.
Preskill, Hallie, and Shanelle Boyle. 2008. “A Multidisciplinary Model of Evaluation Capacity Building.”
American Journal of Evaluation 29 (4): 443–59.
Pressman, Jeffrey L., and Aaron Wildavsky. 1984. Implementation: How Great Expectations in Washington
Are Dashed in Oakland; Or, Why It’s Amazing that Federal Programs Work at All, This Being a Saga of the
Economic Development Administration as Told by Two Sympathetic Observers Who Seek to Build Morals on
a Foundation of Ruined Hopes: The Oakland Project, 3rd ed. Berkeley, CA: University of California Press.
15410072, 2020, S1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/psj.12387 by Universitat Politecnica De Valencia, Wiley Online Library on [27/10/2022]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
S70 Policy Studies Journal, 48:S1
Raftree, Linda, and Michael Bamberger. 2014. Emerging Opportunities: Monitoring and Evaluation in a Tech-
Enabled World. London, UK: ITAD and The Rockefeller Foundation. https://assets.rockefellerfoun
dation.org/app/uploads/20150911122413/Monitoring-and-Evaluation-in-a-Tech-Enabled-World.
pdf. Accessed October 29, 2019.
Ragin, Charles C. 2014. The Comparative Method: Moving Beyond Qualitative and Quantitative Strategies.
Oakland, CA: University of California Press.
Schmitt, Johannes. Forthcoming. “Causal Mechanisms in Evaluation.” New Directions for Evaluation.
Scriven, Michael. 1991. Evaluation Thesaurus, 4th ed. Newbury Park, CA: Sage.
. 2019. “The Checklist Imperative.” New Directions for Evaluation 163: 49–60.
Shadish, William R., Thomas D. Cook, and Laura C. Leviton. 1991. Foundations of Program Evaluation:
Theories of Practice. Newbury Park, CA: Sage.
Springer, J. Fred, Peter J. Haas, and Allan Porowski. 2017. Applied Policy Research: Concepts and Cases, 2nd
ed. New York, NY: Routledge.
Springer, J. Fred, and Allan Porowski. 2012. Natural Variation Logic and the DFC Contribution to Evidence-
Based Practice. Washington, DC: Presentation to the Society for Prevention Research.
Stember, Marilyn. 1991. “Advancing the Social Sciences through the Interdisciplinary Enterprise.” The
Social Science Journal 28 (1): 1–14.
U.S. Government Accountability Office. 1992. Evaluation Issues. Washington, DC: Comptroller General.
Vanderkruik, Rachel, and Marianne E. McPherson. 2017. “A Contextual Factors Framework to Inform
Implementation and Evaluation of Public Health Initiatives.” American Journal of Evaluation 38 (3):
348–59.
Weimer, David, and Aidan R. Vining. 2017. Policy Analysis: Concepts and Practice, 6th ed. New York, NY:
Routledge.
Weiss, Carol H. 1998. Evaluation. Upper Saddle River, NJ: Prentice-Hall.
Westbrook, T’ Pring R., Sarah A. Avellar, and Neil Seftor. 2017. “Reviewing the Reviews: Examining
Similarities and Differences Between Federally Funded Evidence Reviews.” Evaluation Review 41
(3): 183–211.
Williams, Bob, and Richard Hummelbrunner. 2011. System Concepts in Action: A Practitioner’s Toolkit.
Stanford, CA: Stanford University Press.
Wilson-Grau, Ricardo, and Heather Britt. 2012. Outcome Harvesting. Ford Foundation. https://www.
outcomemapping.ca/download/wilsongrau_en_Outome%20Harvesting%20Brief_revised%20Nov
%202013.pdf. Accessed October 29, 2019.