You are on page 1of 434


The New Century Text


Michael Quinn Patton

SAGE Publications

International Educational and Professional Publisher
Thousand Oaks London New Delhi
Copyright © 1997 by Sage Publications, Inc.

All rights reserved. No part of this book may be reproduced or utilized in any form
or by any means, electronic or mechanical, including photocopying, recording, or by
any information storage and retrieval system, without permission in writing from the

For information address:

SAGE Publications, Inc.

i> 2455 Teller Road

Thousand Oaks, California 91320
SAGE Publications Ltd.
6 Bonhill Street
London EC2A 4PU
United Kingdom
SAGE Publications India Pvt. Ltd.
M-32 Market
Greater Kailash I
New Delhi 110 048 India

Printed in the United States of America

Library of Congress Cataloging-in-Publication Data

Patton, Michael Quinn.

Utilization-focused evaluation: the new century text / author,
Michael Quinn Patton. — 3rd ed.
p. cm.
Includes bibliographical references and index.
ISBN 0-8039-5265-1 (pbk.: acid-free paper). — ISBN 0-8039-5264-3
(cloth: acid-free paper)
1. Evaluation research (Social action programs)—United States.
I. Title.
H62.5.U5P37 1996
361.6'1'072—dc20 96-25310

06 07 08 13 12 11 10

Acquiring Editor: C. Deborah Laughton

Editorial Assistant: Dale Grenfell
Production Editor: Diana E. Axelsen
Production Assistant: Sherrise Purdum
Typesetter/Designer: Janelle LeMaster
Cover Designer: Ravi Balasuriya
Print Buyer: Anna Chin

Preface xiii

PART 1. Toward More Useful Evaluations 1

1. Evaluation Use: Both Challenge and Mandate 3

2. What Is Utilization-Focused Evaluation? How Do You Get Started? 19
3. Fostering Intended Use by Intended Users: The Personal Factor 39
4. Intended Uses of Findings 63
5. Intended Process Uses: Impacts of Evaluation Thinking and Experiences 87

PART 2. Focusing Evaluations: Choices, Options, and Decisions 115

6. Being Active-Reactive-Adaptive: Evaluator Roles,
Situational Responsiveness, and Strategic Contingency Thinking 117
7. Beyond the Goals Clarification Game: Focusing on Outcomes 147
8. Focusing an Evaluation: Alternatives to Goals-Based Evaluation 177
9. Implementation Evaluation: What Happened in the Program? 195
10. The Program's Theory of Action: Conceptualizing Causal Linkages 215

PART 3. Appropriate Methods 239

11. Evaluations Worth Using: Utilization-Focused Methods Decisions 241

12. The Paradigms Debate and a Utilitarian Synthesis 265,
13. Deciphering Data and Reporting Results: Analysis, Interpretations,
Judgments, and Recommendations 301
PART 4. Realities and Practicalities of Utilization-Focused Evaluation 339
14. Power, Politics, and Ethics 341
15. Utilization-Focused Evaluation: Process and Premises 371

References 387

Index 415

About the Author 431

L a b l e of (SoKvterv+s


PART 1. Toward More Useful Evaluations 1

1. Evaluation Use: Both Challenge and Mandate 3

Evaluation Use as a Critical Societal Issue 4
High Hopes for Evaluation 6
Historical Context 10
New Directions in Accountability 12
Standards of Excellence for Evaluation 15

2. What Is Utilization-Focused Evaluation? How Do You Get Started? 19

A Comprehensive Approach 20
The First Challenge: Engendering Commitment 22
Charitable Assessment 25
Learning to Value Evaluation 26
Generating Real Questions 29
Creative Beginnings 33

3. Fostering Intended Use by Intended Users: The Personal Factor 39

The First Step in Utilization-Focused Evaluation 41
Evaluation's Premier Lesson 50
Practical Implications of the Personal Factor 50
User-Focused Evaluation in Practice 58
Beyond Just Beginning 60

4. Intended Uses of Findings 63

Identifying Intended Uses From the Beginning 63
Three Uses of Findings 64
Applying Purpose and Use Distinctions 75
Connecting Decisions to Uses 84

5. Intended Process Uses: Impacts of Evaluation Thinking and Experiences 87

Process as Outcome 88
Process Use Defined 90
A Menu: Uses of Evaluation Logic and Processes 90
Using Evaluation to Enhance Shared Understandings 91
Evaluation as an Integral Programmatic Intervention 93
Supporting Engagement, Self-Determination, and Ownership:
Participatory, Collaborative, and Empowerment Evaluation 97
Program and Organization Development: Developmental Evaluation 103
Concerns, Controversies, and Caveats 110

PART 2. Focusing Evaluations: Choices, Options, and Decisions 115

6. Being Active-Reactive-Adaptive: Evaluator Roles, Situational Responsiveness,

and Strategic Contingency Thinking 117
Evaluation Conditions 118
Changing Uses Over Time 119
Variable Evaluator Roles Linked to Variable Evaluation Purposes 121
Situational Evaluation 126
Internal and External Evaluators 138

7. Beyond the Goals Clarification Game: Focusing on Outcomes 147

Evaluation of the Bear Project 147
Whose Goals Will Be Evaluated? 148
Communicating About Goals and Results 153
Focusing on Outcomes and Results 154
Utilization-Focused Outcomes Framework 158
Meaningful and Useful Goals 167
Levels of Goal Specification ^ 169
The Personal Factor Revisited 174
8. Focusing an Evaluation: Alternatives to Goals-Based Evaluation 177
More Than One Way to Manage a Horse 177
Problems With Goals-Based Evaluation 179
Goal-Free Evaluation 181
A Menu Approach to Focusing Evaluations 184
Changing Focus Over Time: Stage Models of Evaluation 187

9. Implementation Evaluation: What Happened in the Program? 195

Checking the Inventory 196
The Importance of Implementation Analysis 197
Focus on Utility: Information for Action and Decisions 199
Ideal Program Plans and Actual Implementation 200
Variations and Options in Implementation Evaluation 205
Connecting Goals and Implementation 211

10. The Program's Theory of Action: Conceptualizing Causal Linkages 215

All the World's a Stage for Theory 215
Mountaintop Inferences 216
Reflections on Causality in Evaluation 216
The Theory Option in Evaluation: Constructing a Means-Ends Hierarchy 217
Three Approaches to Program Theory 219
User-Focused Theory of Action Approach 221
Getting at Assumptions and Causal Connections 225
Developing a Theory of Action as Process Use 229
Theory Informing Practice, Practice Informing Theory 232
Utilization-Focused Evaluation Theory of Action 234
Causal Theorizing in Perspective 237

PART 3. Appropriate Methods 239

11. Evaluations Worth Using: Utilization-Focused Methods Decisions 241

Methods to Support Intended Uses, Chosen by Intended Users 241
The Million Man March 244
Methods and Measurement Options 247
Assuring Methodological Quality and Excellence 248
Credibility and Use 250
Overall Evaluation Validity ^ 251
Believable and Understandable Data 253
Trade-Offs 257
Truth and Utility 259
Designing Evaluations Worth Using: Reflections on the State of the Art 264

12. The Paradigms Debate and a Utilitarian Synthesis 265

Training 265
The Paradigms Debate 267
Dimensions of the Competing Paradigms 272
Whither the Evaluation Methods Paradigms Debate?
The Debate Has Withered 290
Utilization-Focused Synthesis: A Paradigm of Choices 297

13. Deciphering Data and Reporting Results: Analysis, Interpretations,

Judgments, and Recommendations 301
Setting the Stage for Use 302
A Framework for Reviewing Data 307
Arranging Data for Ease of Interpretation: Focusing the Analysis 307
Simplicity in Data Presentations 309
Interpretations and Judgments 315
Making Claims 321
Useful Recommendations 324
Controversy About Recommendations 326
A Futures Perspective on Recommendations 328
Utilization-Focused Reporting 329
Utilization-Focused Reporting Principles 330
Final Reflections 337

PART 4. Realities and Practicalities of Utilization-Focused Evaluation 339

14. Power, Politics, and Ethics 341

Politics and Evaluation: A Case Example 341
The Political Nature of Evaluation 343
The Power of Evaluation 348
Political Maxims for Utilization-Focused Evaluators 350
The Political Foundations of Organizing Stakeholders
Into an Evaluation Task Force 352
Political Rules in Support of Use 356
Fears of Political Co-optation 357
Evaluation Misuse 359
Ethics of Being User-Focused 361
Guarding Against Corruption of an Evaluation 365
Moral Discourse 366

15. Utilization-Focused Evaluation: Process and Premises 371

A User's Perspective 371
The Flow of a Utilization-Focused Evaluation Process 376
The Achilles' Heel of Utilization-Focused Evaluation 380
Fundamental Premises of Utilization-Focused Evaluation 381
A Vision of an Experimenting Society and Experimenting Evaluators 384

References 387

Index 415

About the Author 431


Sufi stories are tales used to pass on ancient wisdom. One such story tells of a revered teacher,
MuUa Nasrudin, who was asked to return to his home village to share his wisdom with the
people there.

.Mttlla Nasrudin mounted a platform in the village square and asked rhetorically,
"O my people, do you know what I am about to tell you?"
Some local rowdies, deciding to amuse themselves, shouted rhythmically,
'"NO. .. ! NO. ..!NO... ! NO. . . !"
"In that case," said MuUa Nasrudin with dignity, "I shall abstain from trying to
instruct such an ignorant community," and he stepped down from the platform.
The following week, having obtained an assurance from the hooligans that they
would not repeat their harassment, the elders of the village again prevailed upon
Nasrudin to address them. "O my people," he began again, "do you know what I am
about to say to you?"
Some of the people, uncertain as to how to react, for he was gazing at them
fiercely, muttered, "Yes."
"In that case," retorted Nasrudin, "there is no need for me to say more." He then
left the village square.
On the third occasion, after a deputation of elders had again visited him and
implored him to make one further effort, he stood before the people: "O my people!
Do you know what I am about to say?"
Since he seemed to demand a reply, the villagers shouted, "Some of us do, and
some of us do not."
"In that case," said Nasrudin as he withdrew, "Let those who know teach those
who do not."
-Adapted from Shah, ii^4:80-81


This book records the things that I have I wanted the second edition to set the re-
learned about doing program evaluation cord straight and clarify points of confu-
from those who know. The pages that fol- sion. By my own criteria, I only partially
low represent an accumulation of wisdom succeeded, and reading that edition now,
from many sources: from interviews with after having received useful feedback from
40 federal decision makers and evaluators students and teachers of evaluation, I find
who participated in a study of the use of it less clear on some points than I would
federal health evaluations; from conver- have wished. I have attempted to correct
sations with program staff and funders those deficiencies.
about their evaluation experiences; from Now that utilization-focused evaluation
evaluation colleagues; from participants in has survived to voting age (or even drinking
my evaluation workshops and university age), I feel liberated to be more celebratory
classes, who are struggling to conduct use- and less argumentative in tone. While my
ful evaluations; and from 25 years of evalu- colleagues Joe Wholey, Harry Hatry, and
ation practice. Kathryn Newcomer (1994) may have over-
The evaluation profession has devel- stated the case when they observed that "in
oped dramatically since the last edition of recent years the watchword of the evalu-
this book 10 years ago. Updating this edi- ation profession has been utilization- fo-
tion with recent evaluation research and cused evaluation" (p. 5), I can say without
thinking proved a formidable task, and it hubris that the widespread acceptance of
substantially increased the length of the the premises of utilization-focused evalu-
book because so much has happened on so ation has influenced my voice. In this edi-
many fronts. New chapters have been tion, I have strived to achieve the more
added on new forms of uses, alternative mature tone of the elder, which I find I'm
roles for evaluators, and new concerns becoming. My professional development
about ethics. Yet, the central challenge to parallels the maturation of our profession.
professional practice remains—doing eval- As a field of professional practice, we have
uations that are useful and actually used! reached a level where we know what we're
The tone and substance of this new edi- doing and have a track record of important
tion have been influenced by the fact that contributions to show. That knowledge and
utilization-focused evaluation is now more those contributions are the bedrock of this
than 20 years old. The first edition, pub- new edition.
lished in 1978 and based on research done While I have learned from and am in-
in 1975, had the tone of a toddler throwing debted to many more people than I can
a temper tantrum because no one seemed acknowledge, the personal and profes-
to be paying attention. The second edition, sional contributions of a few special col-
published in 1986, was alternatively brash leagues have been especially important to
and shy, assertive and uncertain, like an me in recent years, particularly in the writ-
adolescent coming of age. By that time, the ing of this edition. Marv Alkin, Jean King, /
first edition had attracted both praise and and Hallie Preskill read portions of the
skepticism, support and opposition, and revision and offered instructive feedback.
the premises undergirding the approach Other colleagues whose writings and wis-
had been sufficiently disseminated to be dom have informed this edition include
distorted, misquoted, and miscategorized. Eleanor Chelimsky, Huey Chen, Bob
Preface • xv

Covert, David Fetterman, Mike Hendricks, ment to keep major texts current, but what
Ernie House, Ricardo Millett, Sharon began as an update became, for me, a major
Rallis, Jim Sanders, Michael Scriven, Will rewrite as I worked to capture all the new
Shadish, Midge Smith, Yoland Wadsworth, developments in evaluation over the last
Carol Weiss, and Joe Wholey. Minnesota decade. When I was tempted to go on to
provides a thriving evaluation community other projects, C. Deborah helped rekindle
in which to work and an active local chap- my commitment to this book. Her knowl-
ter of the American Evaluation Association edge about both good writing and eval-
where friends and colleagues share experi- uation made the difference. Expert and
ences; among local evaluators who have thorough copy editing by Jacqueline A.
been especially helpful to me in recent Tasch also contributed by enhancing the
years are John Brandl, Tom Dewar, Jean quality of the final production.
King, Dick Krueger, Steve Mayer, Paul Jeanne Campbell has been editor, critic,
Mattessich, Marsha Mueller, Ruth Anne colleague, and collaborator. Most of all,
Olson, Greg Owen, and Stacey Stockdill. I she has been a source of power through her
also want to thank several colleagues and caring, belief, and support. She has helped
clients currently or formerly in government me keep my priorities straight in the strug-
who have contributed ideas and experi- gle to balance family, writing, teaching, and
ences that have influenced this edition: consulting, and somehow integrating them
Valerie Caracelli, Kay Knapp, Gene Lyle, all in a rich and loving life together with
Meg Hargreaves, Laurie Hestness, Dennis our children. My daily experience of her
Johnson, Mike Linder, Richard Sonnich- provides ongoing evidence that getting
sen, and Jennifer Thurman. I thank the older does mean getting better. I dedicate
Union Institute Graduate School, espe- this book to her.
cially Dean Larry Ryan, for sabbatical sup- One final note of thanks to evaluation
port to complete this revision. Ongoing sage Halcolm (pronounced and inter-
encouragement from Union Institute fac- preted, How come? as in "Why?"). Since
ulty and learners supports both my teach- the first edition, rumors have persisted that
ing and writing. Halcolm doesn't really exist despite stories
That this new edition was written at and quotations from him in my writings.
all owes much to the patient nurturing Such ignominious scuttlebutt notwith-
and unwavering support of Sage editor standing, I can assure the reader that
C. Deborah Laughton. Sage has a commit- Halcolm exists vitally in my mind.

. This book is both practical and theoreti- offering summaries and illustrations, and
cal. It tells readers how to conduct program menus designed to present options as
evaluations and why to conduct them in the evaluators work with users to make selec-
manner prescribed. Each chapter contains tions from the vast smorgasbord of evalu-
both a review of the relevant literature and ation approaches. Finally, the book offers a
actual case examples to illustrate major definite point of view developed from the
points. Over 50 menus and exhibits have observation that much of what has passed
been added to this edition, with exhibits for program evaluation has not been very


useful; that evaluation ought to be useful; ence, and integrating theory and practice,
and, therefore, that something different this book provides both an overall frame-
must be done if evaluation is to be useful. work and concrete advice for how to con-
Based on research and professional experi- duct useful evaluations.
Toward More Useful Evaluations

m n the beginning, God created the heaven and the earth.

^-, y And God saw everything that he made. "Behold," God said, "it is very good."
And the evening and the morning were the sixth day.
And on the seventh day God rested from all His work. His archangel came then unto
Him asking, "God, how do you know that what you have created is 'very good'? What are
your criteria? On what data do you base your judgment? Just exactly what results were
you expecting to attain? And aren't you a little close to the situation to make a fair and
unbiased evaluation?"
God thought about these questions all that day and His rest was greatly disturbed. On
the eighth day God said, "Lucifer, go to hell."
Thus was evaluation born in a blaze of glory. . . .
—From Halcolm's The Real Story of Paradise Lost
Evaluation Use:
Both Challenge and Mandate

I he human condition: insidious prejudice, stultifying fear of the unknown, con-

\*^ tagious avoidance, beguiling distortion of reality, awesomely selective percep-
tion, stupefying self-deception, profane rationalization, massive avoidance of truth—all
marvels of evolution's selection of the fittest. Evaluation is our collective effort to outwit
these human propensities—if we choose to use it.

On a cold November morning in Minnesota, some 15 people in various states of

wakefulness have gathered to discuss evaluation of a county human services program.
Citizen evaluation advisory board representatives are present; the county board and
State representatives have arrived; and members of the internal evaluation staff are busy
with handouts and overheads. We are assembled at this early hour to review the past
year's evaluation efforts.
They review the problems with getting started (fuzzy program goals, uncertain
funding); the data collection problems (lack of staff, little program cooperation,
inconsistent state and county data processing systems); the management problems
(unclear decision-making hierarchies, political undercurrents, trying to do too much);
and the findings despite it all ("tentative to be sure," acknowledges the internal
evaluator, "but more than we knew a year ago").
Advisory board members are clearly disappointed: "The data just aren't solid
enough." A county commissioner explains why board decisions have been contrary to
evaluation recommendations: "We didn't really get the information we needed when


we wanted it, and it wasn't what we wanted when we got it." The room is filled with
disappointment, frustration, defensiveness, cynicism, and more than a little anger. There
are charges, countercharges, budget threats, moments of planning, and longer moments
of explaining away problems. The chairperson ends the meeting in exasperation,
lamenting: "What do we have to do to get results we can actually use?"

This book is an outgrowth of, and answer to, that question.

Evaluation Use as a Critical Societal Issue

If the scene I have described were grams? And how can evaluations be con-
unique, it would merely represent a frus- ducted in ways that lead to use? How do
trating professional problem for the people we avoid producing reports that gather
involved. But if that scene is repeated over dust on bookshelves, unread and unused?
and over on many mornings, with many Those are the questions this book ad-
advisory boards, then the question of evalu- dresses, not just in general, but within a
ation use would become what sociologist particular framework: utilization-focused
C. Wright Mills (1959) called a critical evaluation.
public issue: The issue of use has emerged at the
interface between science and action, be-
Issues have to do with matters that transcend tween knowing and doing. It raises funda-
these local environments of the individual mental questions about human rationality,
and the range of his inner life. They have to decision making, and knowledge applied to
do with the organization of many such mi- creation of a better world. And the issue is
lieux into the institutions of an historical as fresh as the morning news. To wit, a re-
society as a whole.... An issue, in fact, often cent newspaper headline: "Agency Evalu-
involves a crisis in institutional arrange- ation Reports Disregarded by Legislators
ments, (pp. 8-9) Who Had Requested Them" (Dawson 1995;
see Exhibit 1.1). Let's look, then, at how
In my judgment, the challenge of using the crisis in utilization has emerged. Fol-
evaluation in appropriate and meaningful lowing that, we'll outline how utilization-
ways represents just such a crisis in insti- focused evaluation addresses this crisis.
tutional arrangements. How evaluations
are used affects the spending of billions of
dollars to fight problems of poverty, dis- A Larger Perspective:
ease, ignorance, joblessness, mental an- Using Information in
guish, crime, hunger, and inequality. How the Information Age
are programs that combat these societal
ills to be judged? How does one distin- The challenge of evaluation use epito-
guish effective from ineffective pro- mizes the more general challenge of knowl-
Evaluation Use • 5

Newspaper Column on Evaluation Use

Agency Evaluation Reports Disregarded by Legislators Who Had Requested Them

Minnesota lawmakers who mandated that state agencies spend a lot of employee hours and money
developing performance evaluation reports pretty much ignored t h e m . . . . The official word from the
state legislative auditor's evaluation of the performance evaluation process: Legislators who asked
for the reports did not pay much attention to them. They were often full of boring and insignificant
Thousands of employee hours and one million taxpayer dollars went into writing the 21 major state
agency performance evaluation reports. The auditor reports the sad results:

• Only three of 21 state commissioners thought that the performance reports helped the governor
make budget choices regarding their agencies.
• Only seven of 21 agencies were satisfied with the attention given the reports in the House
committees reviewing their programs and budgets. And only one agency was satisfied with the
attention it received in the Senate.

Agency heads also complained to legislative committees this year that the 1993 law mandating the
reports was particularly painful because departments had to prepare new two-year budget requests and
program justifications at the same time. That "dual" responsibility resulted in bureaucratic paperwork
factories running overtime.
"Our experience is that few, if any, legislators have actually read the valuable information contained
in our report...," one agency head told auditors.
"The benefits of performance reporting will not materialize if one of the principal audiences is un-
interested," said another.
"If the Legislature is not serious about making the report 'the key document' in the budget decision
process, it serves little value outside the agency," said a third department head.
Mandating the reports and ignoring them looks like another misguided venture by the 201 -member
Minnesota Legislature. It is the fifth-largest Legislature in the nation and during much of the early part
of this year's five-month session had little to do. With time on their hands, lawmakers could have
devoted more time to evaluation reports. But if the reports were dull and of little value in evaluating
successes of programs, can they be blamed for not reading them?

Gary Dawson, "State Journal" column

Saint Paul Pioneer Press, August 7,1995, p. 4B

SOURCE: Reprinted with permission of Saint Paul Pioneer Press.

edge use in our times. Our age—the Age of communicate information. Our problem is
Information and Communications—has keeping up with, sorting out, absorbing,
developed the capacity to generate, store, and using information. Our technological
retrieve, transmit, and instantaneously capacity for gathering and computerizing

information now far exceeds our human difference was: the graduates felt much
ability to process and make sense out of it more guilty about how they wasted time.
all. We're constantly faced with decid- Research on adolescent pregnancy illus-
ing what's worth knowing versus what to trates another dimension of the knowledge
ignore. use problem. Adolescent health specialist
Getting people to use what is known has Michael Resnik (1984) interviewed teenag-
become a critical concern across the differ- ers who became pregnant. He found very
ent knowledge sectors of society. A major few cases in which the problem was a lack
specialty in medicine (compliance re- of information about contraception, about
search) is dedicated to understanding why pregnancy, or about how to avoid pregnan-
so many people don't follow their doctor's cies. The problem was not applying—just
orders. Common problems of information not using—what they knew. Resnick found
use underlie trying to get people to use seat "an incredible gap between the knowledge
belts, quit smoking, begin exercising, eat and the application of that knowledge. In
properly, and pay attention to evaluation so many instances, it's heartbreaking—they
findings. In the fields of nutrition, energy have the knowledge, the awareness, and
conservation, education, criminal justice, the understanding, but somehow it doesn't
financial investment, human services, cor- apply to them" (p. 15).
porate management, international devel- These examples of the challenges of put-
ting knowledge to use are meant to set a
opment—the list could go on and on—a
general context for the specific concern of
central problem, often the central problem,
this book: narrowing the gap between gen-
is getting people to apply what is already
erating evaluation findings and actually us-
ing those findings for program decision
In agriculture, a major activity of exten- making and improvement. Although the
sion services is trying to get farmers to problem of information use remains central
adopt new scientific methods. Experienced to our age, we are not without knowledge
agricultural extension agents like to tell the about what to do. We've learned a few
story of a young agent telling a farmer things about overcoming our human resis-
about the latest food production tech- tance to new knowledge and change, and
niques. As he begins to offer advice, the over the last two decades of professional
farmer interrupts him and says, "No sense evaluation practice, we've learned a great
in telling me all those new ideas, young deal about how to increase evaluation use.
man. I'm not doing half of what I know I Before presenting what we've learned, let's
should be doing now." look more closely at the scope of the chal-
I remember coming across a follow-up lenge of using evaluation processes and
study of participants in time-management findings.
training. Few were applying the time-
management techniques they had learned.
When graduates of time-management High Hopes for Evaluation
training were compared with a sample of
nonparticipants, the differences were not Evaluation and Rationality
in how people in each group managed their
time. The time-management graduates had Edward Suchman (1967) began his
quickly fallen back into old habits. The seminal text on evaluation research with
Evaluation Use • 7

Hans Zetterberg's observation that "one of tify all such studies, as early as 1976, the
the most appealing ideas of our century is Congressional Sourcebook on Federal Pro-
the notion that science can be put to work gram Evaluations contained 1,700 cita-
to provide solutions to social problems" tions of program evaluation reports issued
(p. 1). Social and behavioral science em- by 18 U.S. Executive Branch agencies and
bodied the hope of finally applying human the General Accounting Office (GAO)
rationality to the improvement of society. during fiscal years 1973 through 1975
In 1961, Harvard-educated President John (Office of Program Analysis, GAO
F. Kennedy welcomed scientists to the 1976:1). The numbers have grown sub-
White House as never before. Scientific stantially since then. In 1977, federal
perspectives were taken into account in the agencies spent $64 million on program
writing of new social legislation. Econo- evaluation and more than $1.1 billion on
mists, historians, psychologists, political social research and development (Abram-
scientists, and sociologists were all wel- son 1978). The third edition of the Com-
comed into the public arena to share in the pendium of Health and Human Services
reshaping of modern postindustrial society. Evaluation Studies (U.S. Department of
They dreamed of and worked for a new Health and Human Services 1983) con-
order of rationality in government—a ra- tained 1,435 entries. The fourth volume of
tionality undergirded by social scientists the U.S. Comptroller General's directory
who, if not philosopher-kings themselves, of Federal Evaluations (GAO 1981) iden-
were at least ministers to philosopher- tified 1,429 evaluative studies from vari-
kings. Carol Weiss (1977) has captured the ous U.S. federal agencies completed be-
optimism of that period. tween September 1, 1979, and September
30, 1980. While the large number of and
There was much hoopla about the rationality substantial funding for evaluations sug-
that social science would bring to the untidy gested great prosperity and acceptance,
world of government. It would provide hard under the surface and behind the scenes,
data for planning . . . and give cause-and- a crisis was building—a utilization crisis.
effect theories for policy making, so that
statesmen would know which variables to
alter in order to effect the desired outcomes. Reality Check:
It would bring to the assessment of alterna- Evaluations Largely Unused
tive policies a knowledge of relative costs and
benefits so that decision makers could select By the end of the 1960s, it was becoming
the options with the highest payoff. And clear that evaluations of Great Society so-
once policies were in operation, it would cial programs were largely ignored or poli-
provide objective evaluation of their effec- ticized. The Utopian hopes for a scientific
tiveness so that necessary modifications and rational society had somehow failed to
could be made to improve performance, be realized. The landing of the first human
(p. 4) on the moon came and went, but poverty
persisted despite the 1960s "War" on it—
One manifestation of the scope, perva- and research was still not being used as the
siveness, and penetration of these hopes is basis for government decision making.
the number of evaluation studies actually While all types of applied social science
conducted. While it is impossible to iden- suffered from underuse (Weiss 1977),

nonuse seemed to be particularly charac- (1968) for the National Science Founda-
teristic of evaluation studies. Ernest House tion; and the Social Science Research
(1972) put it this way: "Producing data is Council's (1969) prospective on the Behav-
one thing! Getting it used is quite another" ioral and Social Sciences.
(p. 412). Williams and Evans (1969) wrote British economist L. J. Sharpe (1977)
that "in the final analysis, the test of the reviewed the European literature and
effectiveness of outcome data is its impact commission reports on use of social scien-
on implemented policy. By this standard, tific knowledge and reached a decidedly
there is a dearth of successful evaluation gloomy conclusion:
studies" (p. 453). Wholey et al. (1970)
concluded that "the recent literature is We are brought face to face with the fact that
unanimous in announcing the general fail- it has proved very difficult to uncover many
ure of evaluation to affect decision making instances where social science research has
in a significant way" (p. 46). They went on had a clear and direct effect on policy even
to note that their own study "found the when it has been specifically commissioned
same absence of successful evaluations by government, (p. 45)
noted by other authors" (p. 48). Cohen
and Garet (1975) found "little evidence to Ronald Havelock (1980) of the Knowledge
indicate that government planning offices Transfer Institute generalized that "there is
have succeeded in linking social research a gap between the world of research and
and decision making" (p. 19). Seymour the world of routine organizational prac-
Deitchman (1976), in his The Best-Laid tice, regardless of the field" (p. 13). Rippey
Schemes: A Tale of Social Research and (1973) commented,
Bureaucracy, concluded that "the impact of
the research on the most important affairs At the moment there seems to be no indica-
of state was, with few exceptions, nil" tion that evaluation, although the law of the
(p. 390). Weidman et al. (1973) concluded land, contributes anything to educational
that "on those rare occasions when evalu- practice, other than headaches for the re-
ations studies have been used . . . the little searcher, threats for the innovators, and
use that has occurred [has been] fortuitous depressing articles for journals devoted to
rather than planned" (p. 15). In 1972, evaluation, (p. 9)
Carol Weiss viewed underutilization as one
of the foremost problems in evaluation re- It can hardly come as a surprise, then,
search: "A review of evaluation experience that support for evaluation began to de-
suggests that evaluation results have not cline. During the Reagan Administration,
exerted significant influence on program the GAO (1987) found that federal evalu-
decisions" (pp. 10-11). This conclusion ation received fewer resources and that
was echoed by four prominent commis- "findings from both large and small stud-
sions and study committees: the U.S. ies have become less easily available for
House Committee on Government Opera- use by the Congress and the public" (p. 4).
tions, Research and Technical Programs In both 1988 and 1992, the GAO pre-
Subcommittee (1967); the Young Commit- pared status reports on program evalu-
tee report published by the National Acad- ation to inform changing executive
emy of Sciences (1968); the Report of the branch administrations at the federal
Special Commission on the Social Sciences level.
Evaluation Use • 9

We found a 22-percent decline in the number the available information did not reach the
of professional staff in agency program [appropriate Senate] Committee, or reached
evaluation units between 1980 and 1984. A it in a form that was too highly aggregated
follow-up study of 15 units that had been to be useful or that was difficult to digest.
active in 1980 showed an additional 12% (GAO 1995:39)
decline in the number of professional staff
between 1984 and 1988. Funds for program Many factors affect evaluation use in
evaluation also dropped substantially be- Congress (Boyer and Langbein 1991),
tween 1980 and 1984 (down by 37% in but politics is the overriding factor
constant 1980 dollars).... Discussions with (Chelimsky 1995a, 1992, 1987a, 1987b).
the Office of Management and Budget offer Evaluation use throughout the U.S. fed-
no indication that the executive branch in- eral government appears to have contin-
vestment in program evaluation showed any ued its spiral of decline through the 1990s
meaningful overall increase from 1988 to (Wargo 1995; Popham 1995; Chelimsky
1992. (GAO 1992a:7) 1992). In many federal agencies, the em-
phasis shifted from program evaluation to
The GAO (1992a) went on to conclude inspection, auditing, and investigations
that its 1988 recommendations to enhance (N. L. Smith 1992; Hendricks et al. 1990).
the federal government's evaluation func- However, anecdotal reports from state
tion had gone unheeded: "The effort to and local governments, philanthropic
rebuild the government's evaluation capac- foundations, and the independent sector
ity that we called for in our 1988 transi- suggest a surge of interest in evaluation. I
tion series report has not been carried believe that whether this initial interest
out" (p. 7). Here, ironically, we have an and early embrace turn into long-term
evaluation report on evaluation going un- support and a sustainable relationship will
used. depend on the extent to which evaluations
In 1995, the GAO provided another prove useful.
report to the U.S. Senate on Program Evalu- Nor is the challenge only one of increas-
ation, subtitled Improving the Flow of In- ing use. "An emerging issue is that of misuse
formation to the Congress. GAO analysts of findings. The use-nonuse continuum is a
conducted follow-up case studies of three measure of degree or magnitude; misuse is
major federal program evaluations: the a measure of the manner of use" (Alkin and
Comprehensive Child Development Pro- House 1992:466). Marv Alkin (1991,
gram, the Community Health Centers pro- 1990; Alkin and Coyle 1988), an early
gram, and the Chapter 1 Elementary and theorist of user-oriented evaluation, has
Secondary Education Act aimed at provid- long emphasized that evaluators must at-
ing compensatory education services to tend to appropriate use, not just amount of
low-income students. The analysts con- use. Ernest House (1990a), one of the most
cluded that astute observers of how the evaluation pro-
fession has developed, observed in this re-
lack of information does not appear to be the gard: "Results from poorly conceived stud-
main problem. Rather, the problem seems to ies have frequently been given wide
be that available information is not organ- publicity, and findings from good studies
ized and communicated effectively. Much of have been improperly used" (p. 26). The

field faces a dual challenge then: support- create a demand for systematic empirical
ing and enhancing appropriate uses while evaluation of the effectiveness of govern-
also working to eliminate improper uses. ment programs (Walters 1996; Wye and
We are called back, then, to the early Sonnichsen 1992), although that was often
morning scene that opened this chapter: threatening to programs since many had
decision makers lamenting the disappoint- come to associate evaluation with an attack
ing results of an evaluation, complaining and to think of evaluators as a program
that the findings did not tell them what they termination squad.
needed to know. For their part, evaluators Education has long been a primary tar-
complain about many things, as well, "but get for evaluation. Beginning with Joseph
their most common complaint is that their Rue's comparative study of spelling per-
findings are ignored" (Weiss 1972d:319). formance by 33,000 students in 1897, the
The question from those who believe in the field of educational evaluation has been dom-
importance and potential utility of evalu- inated by achievement testing. During the
ation remains: What has to be done to get Cold War, after the Soviet Union launched
results that are appropriately and meaning- Sputnik in 1957, calls for better educa-
fully used? This question has taken center tional assessments accompanied a critique
stage as program evaluation has emerged as born of fear that the education gap was
a distinct field of professional practice. even larger than the "missile gap." Demand
for better evaluations also accompanied
the growing realization that, years after
Historical Context the 1954 Supreme Court Brown decision
requiring racial integration of schools,
The Emergence of Program "separate and unequal" was still the norm
Evaluation as a Field of rather than the exception. Passage of the
Professional Practice U.S. Elementary and Secondary Education
Act in 1965 contributed greatly to more
Like many poor people, evaluation in comprehensive approaches to evaluation.
the United States has grown up in the The massive influx of federal money aimed
"projects"—federal projects spawned by at desegregation, innovation, compensa-
the Great Society legislation of the 1960s. tory education, greater equality of oppor-
When the federal government of the tunity, teacher training, and higher student
United States began to take a major role in achievement was accompanied by calls for
alleviating poverty, hunger, and joblessness evaluation data to assess the effects on
during the Depression of the 1930s, the the nation's children. To what extent did
closest thing to evaluation was the employ- these changes really make an educational
ment of a few jobless academics to write difference?
program histories. It was not until the mas- But education was only one arena in the
sive federal expenditures on an awesome War on Poverty of the 1960s. Great Society
assortment of programs during the 1960s programs from the Office of Economic
and 1970s that accountability began to Opportunity were aimed at nothing less
mean more than assessing staff sincerity or than the elimination of poverty. The cre-
political head counts of opponents and pro- ation of large-scale federal health pro-
ponents. A number of events converged to grams, including community mental health
Evaluation Use • 11



centers, was coupled with a mandate for sons from this period of large-scale social
evaluation, often at a level of 1% to 3 % of experimentation and government inter-
program budgets. Other major programs vention: First, there is not enough money
were created in housing, employment, ser- to do all the things that need doing; and,
vices integration, community planning, ur- second, even if there were enough money,
ban renewal, welfare, family programs (Weiss it takes more than money to solve complex
and Jacobs 1988), and so on—the whole of human and social problems. As not every-
which came to be referred to as "butter" (in thing can be done, there must be a basis for
opposition to the "guns") expenditures. In deciding which things are worth doing.
the 1970s, these Great Society programs Enter evaluation.1
collided head on with the Vietnam War, While pragmatists turned to evaluation
rising inflation, increasing taxes, and the as a commonsensical way to figure out
fall from glory of Keynesian economics. All what works and is worth funding, visionar-
in all, it was what sociologists and social ies were conceptualizing evaluation as the
historians, with a penchant for under- centerpiece of a new kind of society: the
statement, would characterize as "as a pe- experimenting society. Donald T. Campbell
riod of rapid social and economic change." ([1971] 1991) gave voice to this vision in
Program evaluation as a distinct field of his 1971 address to the American Psycho-
professional practice was born of two les- logical Association.

The experimenting society will be one which establishment of the Canadian Evaluation
will vigorously try out proposed solutions to Society and the Australasian Evaluation So-
recurrent problems, which will make hard- ciety. In 1995, the first International Evalu-
headed and multidimensional evaluations of ation Conference included participation
the outcomes, and which will move on to from new professional evaluation associa-
other alternatives when evaluation shows one tions representing Central America,
reform to have been ineffective or harmful. Europe, and the United Kingdom.
We do not have such a society today, (p. 223)

Early visions for evaluation, then, fo- N e w Directions

cused on evaluation's expected role in in Accountability
guiding funding decisions and differenti-
ating the wheat from the chaff in federal A predominant theme of the 1995 In-
programs. But as evaluations were imple- ternational Evaluation Conference was
mented, a new role emerged: helping im- worldwide interest in reducing government
prove programs as they were imple- programs and making remaining programs
mented. The Great Society programs more effective and accountable. This
foundered on a host of problems: manage- theme first took center stage in the United
ment weaknesses, cultural issues, and fail- States with the election of Ronald Reagan
ure to take into account the enormously as President in 1980. He led a backlash
complex systems that contributed to pov- against government programming, espe-
erty. Wanting to help is not the same as cially welfare expenditures. Decline in sup-
knowing how to help; likewise, having the port for government programs was fueled
money to help is not the same as knowing by the widespread belief that such efforts
how to spend money in a helpful way. were ineffective and wasteful. While the
Many War on Poverty programs turned Great Society and War on Poverty pro-
out to be patronizing, controlling, de- grams of the 1960s had been founded on
pendency generating, insulting, inade- good intentions and high expectations,
quate, misguided, overpromised, waste- they came to be perceived as failures. The
ful, and mismanaged. Evaluators were "needs assessments" that had provided the
called on not only to offer final judgments rationales for those original programs had
about the overall effectiveness of pro- found that the poor, the sick, the homeless,
grams, but to gather process data and the uneducated—the needy of all kinds—
provide feedback to help solve program- needed services. So services and programs
ming problems along the way (Sonnichen were created. Thirty years down the road
1989; Wholey and Newcomer 1989). from those original efforts, and billions of
By the mid-1970s, interest in evalua- dollars later, most social indicators re-
tion had grown to the point where two vealed little improvement. Poverty statis-
professional organizations were estab- tics—including the number of multigenera-
lished: the academically oriented Evalua- tional welfare recipients and rates of
homelessness, hard-core unemployment,
tion Research Society and the practitioner-
and underemployment—as well as urban
oriented Evaluation Network. In 1984,
degradation and increasing crime com-
they merged to form the American Evalu-
bined to raise questions about the effective-
ation Association. By that time, interest in
ness of services. Reports on effective pro-
evaluation had become international, with
Evaluation Use • 13

grams (e.g., Guttmann and Sussman 1995; an aside, and in all fairness, this perception
Kennedy School of Government 1995; is not unique to the late twentieth century.
Schorr 1988) received relatively little me- In the nineteenth century, Spencer traced
dia attention compared to the relentless 32 acts of the British Parliament and dis-
press about waste and ineffectiveness covered that 29 produced effects contrary
(Wortman 1995). In the 1990s, growing to those intended (Edison 1983:1,5). Given
concerns about federal budget deficits and today's public cynicism, 3 effective pro-
runaway entitlement costs intensified the grams out of 32 might be considered a
debate about the effectiveness of govern- pretty good record.
ment programs. Both conservatives and lib- More damning still, the perception has
erals were faced with public demands to grown in modern times that no relationship
know what had been achieved by all the exists between the amount of money spent
programs created and all the money spent. on a problem and the results accomplished,
The call for greater accountability became an observation made with a sense of despair
a watershed at every level—national, state, by economist John Brandl in his keynote
and local; public sector, nonprofit agencies, address to the American Evaluation Asso-
and the private sector (Bonsignore 1996; ciation in New Orleans in 1988. Brandl, a
HFRP 1996a, 1996b; Horsch 1996; Briz- professor in the Hubert H. Humphrey In-
ius and Campbell 1991). stitute of Public Affairs at the University of
Clear answers were not forthcoming. Minnesota (formerly its Director), was pres-
Few programs could provide data on re- ent at the creation of many human services
sults achieved and outcomes attained. In- programs during his days at the old Depart-
ternal accountability had come to center on ment of Health, Education, and Welfare
how funds were spent (inputs monitoring), (HEW). He created the interdisciplinary
eligibility requirements (who gets services, Evaluation Methodology training program
i.e., client characteristics), how many peo- at the University of Minnesota. Brandl later
ple get services, what activities they par- moved from being a policy analyst to being
ticipate in, and how many complete the a policy formulator as a Minnesota state
program. These indicators of inputs, cli- legislator. His opinions carry the weight of
ent characteristics, activities, and outputs both study and experience. In his 1988
(program completion) measured whether keynote address to professional evaluators,
providers were following government rules he opined that no demonstrable relation-
and regulations rather than whether de- ship exists between program funding levels
sired results were being achieved. Control and impact, that is, between inputs and
had come to be exercised through audits, outputs; more money spent does not mean
licensing, and service contracts rather than higher quality or greater results.
through measured outcomes. The conse- In a 1994 article, Brandl updated his
quence was to make providers and prac- analysis. While his immediate focus was on
titioners compliance-oriented rather than Minnesota state government, his com-
results-focused. Programs were rewarded ments characterize general concerns about
for doing the paperwork well rather than the effectiveness of government programs
for making a difference in clients' lives. in the 1990s:
Public skepticism turned to deep-seated
cynicism. Polling data showed a wide- The great government bureaucracies of Min-
spread perception that "nothing works." As nesota and the rest of America today are

Premises of Reinventing Government

What gets measured gets done.

If you don't measure results, you can't tell success from failure.
If you can't see success, you can't reward it.
If you can't reward success, you're probably rewarding failure.
If you can't see success, you can't learn from it.
If you can't recognize failure, you can't correct it.
If you can demonstrate results, you can win public support.

SOURCE: From Osborne and Gaebler (1992: chapter 5, "Results-Oriented Government").

failing for the same reason that the formerly countability — a n d greater use of evalu-
Communist governments in Europe fell a ation processes and results—the center-
few years ago and Cuba is teetering today. piece of reform. This is illustrated in Ex-
There is no systematic accountability. People hibit 1.2 by the premises for results-
are not regularly inspired to do good work, oriented government p r o m u l g a t e d by
rewarded for outstanding performance, or Osborne and Gaebler (1992) in their in-
penalized for not accomplishing their tasks. fluential and best-selling book, Reinvent-
In bureaus, people are expected to do ing Government: How the Entrepre-
well because the rules tell them to do so. neurial Spirit is Transforming the Public
Indeed, often in bureaus here and abroad, Sector.
able, idealistic workers become disillusioned The future of evaluation is tied to the
and burned out by a system that is not ori- future effectiveness of programs. N e w calls
ented to produce excellent results. No infu- for results-oriented, accountable program-
sion of management was ever going to make ming challenge evaluators to increase the
operations of the Lenin shipyard in Gdansk use and effectiveness of evaluations. Indict-
effective. ments of program effectiveness are, under-
Maybe—I would say surely—until sys- neath, also indictments of evaluation. T h e
tematic accountability is built into govern- original promise of evaluation was that it
ment, no management improvements will do would point the way to effective program-
the job. (p. 13A). ming. Later, that promise broadened to
include providing ongoing feedback for im-
Similar indictments of government ef- provements during implementation. Evalu-
fectiveness are the foundation for efforts ation cannot be considered t o have ful-
at Total Quality M a n a g e m e n t , Re-engi- filled its promise if, as is increasingly the
neering G o v e r n m e n t , or Reinventing case, the general perception is that few
G o v e r n m e n t . These and other manage- programs have attained desired outcomes,
ment innovations make new forms of ac- that "nothing works."
Evaluation Use • 15

Such conclusions about programs raise use of program evaluations throughout

fundamental questions about the^role of Canada, as is the Australasian Evaluation
evaluation. Can evaluation contribute to Society in Australia and New Zealand (AES
increased program effectiveness? Can 1995; Sharp 1994; Caulley 1993; Funnell
^valuation be used to improve programs? 1993; Owen 1993; Sharp and Lindsay
Do evaluators bear any responsibility for 1992). European governments are rou-
use and program improvement? This book tinely using evaluation and policy analysis
will answer these questions in the affirm- too, although the nature, location, and re-
ative and offer utilization-focused evalu- sults of evaluation efforts vary from coun-
ation as an approach for realizing evalu- try to country (see, for example, Hooger-
ation's original vision of contributing to werf 1985; Patton 1985). International
long-term program effectiveness and im- agencies have also begun using evaluation'
proved decision making. to assess the full range of development
efforts under way in Third World coun-
tries. The World Bank, UNICEF, the Aus-
Worldwide Demand tralian Development Assistance Bureau
for Evaluation (1982), and the U.S. Agency for Interna-
tional Development are examples of inter-
The challenge to evaluation extends national development organizations with
well beyond government-supported pro- significant and active evaluation offices.
gramming. Because of the enormous size Global interest in evaluation culminated in
and importance of government efforts, the first-ever International Evaluation
program evaluation is inevitably affected Conference in Vancouver, Canada, in No-
by trends in the public sector, but evalu- vember 1995. With over 1,500 participants
ation has also been growing in importance from 61 countries, this conference made it
in the private and independent sectors (In- clear that evaluation had become a global
dependent Sector 1993). Corporations, challenge. In his keynote address to the
philanthropic foundations, and nonprofit conference, Masafumi Nagao (1995) from
agencies are increasingly turning to evalua- Japan's Sasakawa Peace Foundation chal-
tors for help in enhancing their organiza- lenged evaluators to think globally even as
tional effectiveness. they evaluate locally, that is, to consider
Nor is interest in empirically assessing how international forces and trends affect
policies and programs limited to the United project outcomes, even in small and remote
States. The federal government of Canada, communities. This book will include atten-
especially the Auditor General's Office, has tion to how utilization-focused evaluation
demonstrated a major commitment to con- offers a process for adapting evaluation
ducting program evaluations at both na- processes to address multicultural and in-
tional and provincial levels (Comptroller ternational issues and constituencies.
General of Canada 1989; Rutman and
Mayne 1985), and action-oriented evalu-
ation has emerged as an importance prac- Standards of Excellence
tice in many Canadian organizations for Evaluation
(Hudson, Mayne, and Thomlison 1992).
The Canadian Evaluation Society is active One major contribution of the profes-
in promoting the appropriate practice and sionalization of evaluation has been articu-

lation of standards for evaluation. The openly skeptical about spending scarce
standards make it clear that evaluations funds on evaluations they couldn't under-
ought to be useful. stand and/or found irrelevant. Evaluators
In the past many researchers took the were being asked to be "accountable," just
position that their responsibility was as program staff were supposed to be ac-
merely to design studies, collect data, and countable. The questions emerged with
publish findings; what decision makers did uncomfortable directness: Who will evalu-
with those findings was not their problem. ate the evaluators? How will evaluation
This stance removed from the evaluator be evaluated? It was in this context that
any responsibility for fostering use and professional evaluators began discussing ,.
placed all the blame for nonuse or under- standards.
utilization on decision makers. The most comprehensive effort at devel-
Academic aloofness from the messy oping standards was hammered out over
world in which research findings are trans- five years by a 17-member committee ap-
lated into action has long been a charac- pointed by 12 professional organizations,
teristic of basic scientific research. Before with input from hundreds of practicing
the field of evaluation identified and evaluation professionals. The standards
adopted its own standards, criteria for published by the Joint Committee on
judging evaluations could scarcely be dif- Standards in 1981 dramatically reflected
ferentiated from criteria for judging re- the ways in which the practice of evaluation
search in the traditional social and behav- had matured. Just prior to publication, Dan
ioral sciences, namely, technical quality and Stufflebeam (1980), chair of the commit-
methodological rigor. Use was ignored. tee, summarized the committee's work as
Methods decisions dominated the evalu- follows:
ation design process. Methodological rigor
meant experimental designs, quantitative
data, and sophisticated statistical analysis. The standards that will be published essen-
Whether decision makers understood such tially call for evaluations that have four
analyses was not the researcher's problem. features. These are utility, feasibility, propri-
Validity, reliability, measurability, and gen- ety, and accuracy. And I think it is interesting
eralizability were the dimensions that re- that the Joint Committee decided on that
ceived the greatest attention in judging particular order. Their rationale is that an
evaluation research proposals and reports evaluation should not be done at all if there
(e.g., Bernstein and Freeman 1975). In- is no prospect for its being useful to some
deed, evaluators concerned about increas- audience. Second, it should not be done if it
ing a study's usefulness often called for ever is not feasible to conduct it in political terms,
more methodologically rigorous evalu- or practicality terms, or cost-effectiveness
ations to increase the validity of findings, terms. Third, they do not think it should be
thereby supposedly compelling decision done if we cannot demonstrate that it will be
makers to take findings seriously. conducted fairly and ethically. Finally, if we
By the late 1970s, however, it was be- can demonstrate that an evaluation will have
coming clear that greater methodological utility, will be feasible, and will be proper in
rigor was not solving the use problem. its conduct, then they said we could turn to
Program staff and funders were becoming the difficult matters of the technical ade-

Evaluation Use • 17

Standards for Evaluation

The Utility Standards are intended to ensure that an evaluation will serve the practical information
needs of intended users.

The Feasibility Standards are intended to ensure that an evaluation will be realistic, prudent,
diplomatic, and frugal.

The Propriety Standards are intended to ensure that an evaluation will be conducted legally,
ethically, and with due regard for the welfare of those involved in the evaluation, as well as those
affected by its results.

The Accuracy Standards are intended to ensure that an evaluation will reveal and convey technically
adequate information about the features that determine worth or merit of the program being

SOURCE: Joint Committee 1994.

quacy of the evaluation, (p. 90; emphasis in racy-based evaluation requires situational
the original). responsiveness, methodological flexibil-
ity, multiple evaluator roles, political so-
In 1994, revised standards were pub- phistication, and substantial doses of
lished following an extensive review span- creativity, all elements of utilization-
ning several years (Joint Committee focused evaluation.
1994; Patton 1994a). While some changes
were made in the 30 individual standards,
the overarching framework of four pri- From Problem to Solution:
mary criteria remained unchanged: util- Toward Use in Practice
ity, feasibility, propriety, and accuracy
(see Exhibit 1.3). Taking the standards This chapter has reviewed the emer-
seriously has meant looking at the world gence of program evaluation as a profes-
quite differently. Unlike the traditionally sional field of practice with standards of
aloof stance of basic researchers, evalua- excellence and a mandate to be useful. The
tors are challenged to take responsibility early utilization crisis called into question
for use. No more can we play the game of whether the original hopes for evaluation
blame the resistant decision maker. Imple- would be, or even could be, realized. Utili-
mentation of a utility-focused, feasibility- zation-focused evaluation developed in re-
conscious, propriety-oriented, and accu- sponse to that crisis and as a way of fulfill-


ing, in practice, the mandate of the utility Note

standard. With this background as context,
we turn in the next chapter to an overview 1. For a full discussion of evaluation's eraer-
of utilization-focused evaluation. gence as both a discipline and a field of profes-
sional practice, see House (1993).
What Is Utilization-Focused Evaluation?
How Do You Get Started?

W hen I was a child, I spake as a child, I understood as a child. I thought as a

child: but when I became an adult, I put away childish things. I decided to
become an evaluator. My only problem was, I didn't have the foggiest idea what I was get-
ting into or how to begin. ,

A modern version of an ancient Asian story (adapted from Shah 1964:64) casts light on
the challenge of searching for evaluation use.

A man found his neighbor down on his knees under a street lamp looking for
something. "What have you lost, friend?"
'"Alv key," replied the man on his knees.
.I'/CT .7 few minutes of helping him search, the neighbor asked, "Where did you
diop it?"
"In that dark pasture," answered his friend.
"Then why, for heaven's sake, are you looking here?"
"Because there is more light here."


The obvious place to look for use is in needed is a comprehensive framework

what happens after an evaluation is com- within which to develop and implement an
pleted and there's something to use. What evaluation with attention to use built in. In
we shall find, however, is that the search program evaluation, as in life, it is one's
for use takes us into the "dark pasture" of overall philosophy integrated into prag-
decisions made before any data are ever matic principles that provides a guide to
collected. The reader will find relatively action. Utilization-focused evaluation of-
little in this book about what to do when a fers both a philosophy of evaluation and a
study is over. At that point, the potential practical framework for designing and con-
for use has been largely determined. Utili- ducting evaluations.
zation-focused evaluation emphasizes that Since its original publication in 1978,
what happens from the very beginning of a Utilization-Focused Evaluation has been
study will determine its eventual impact tested and applied in thousands of evalu-
long before a final report is produced. ations in the United States and throughout
the world. This reservoir of experience
provides strong confirmation that evalu-
A Comprehensive Approach ations will be used if the foundation for use
is properly prepared. Evidence to that ef-
The question of how to enhance the use fect will be presented throughout this
of program evaluation is sufficiently com- book. First, let me outline the utilization-
plex that a piecemeal approach based on focused approach to evaluation and indi-
isolated prescriptions for practice is likely cate how it responds to the challenge of
to have only piecemeal impact. Over- getting evaluations used.
views of research on evaluation use (e.g.,
Huberman 1995; Lester and Wilds 1990;
Connor 1988; Greene 1988b; McLaughlin Utilization-Focused Evaluation
et al. 1988; M. F. Smith 1988; Cousins and
Leithwood 1986; Leviton and Hughes Utilization-Focused Evaluation begins
1981) suggest that the problems of under- with the premise that evaluations should be
use will not be solved by compiling and judged by their utility and actual use; there-
following some long list of evaluation axi- fore, evaluators should facilitate the evalu-
oms. It's like trying to live your life accord- ation process and design any evaluation
ing to Poor Richard's Almanac. At the mo- with careful consideration of how every-
ment of decision, you reach into your thing that is done, from beginning to end,
socialization and remember, "He who hesi- will affect use. Nor is use an abstraction.
tates is lost." But then again, "Fools rush Use concerns how real people in the real
in where angels fear to tread." Advice to world apply evaluation findings and expe-
young evaluators is no less confusing: "Work rience the evaluation process. Therefore,
closely with decision makers to establish the focus in utilization-focused evaluation
trust and rapport," but "maintain distance is on intended use by intended users.
to guarantee objectivity and neutrality." In any evaluation, there are many poten-
Real-world circumstances are too com- tial stakeholders and an array of possible
plex and unique to be routinely ap- uses. Utilization-focused evaluation re-
proached through the application of iso- quires moving from the general and ab-
lated pearls of evaluation wisdom. What is stract, that is, possible audiences and po-
What Is Utilization-Focused Evaluation? • 21

Guiding Principles for Evaluators

Systematic Inquiry
Evaluators conduct systematic, data-based inquiries about what is being evaluated.

Evaluators provide competent performance to stakeholders.

Evaluators ensure the honesty and integrity of the entire evaluation process.

Respect for People

Evaluators respect the security, dignity, and self-worth of the respondents, program participants,
clients, and other stakeholders with whom they interact.

Responsibilities for General and Public Welfare

Evaluators articulate and take into account the diversity of interests and values that may be related
to the general and public welfare.

SOURCE: American Evaluation Association Guiding Principles for Evaluators, Shadish et al. 1995.

tential uses, to the real and specific: actual requires negotiation: The evaluator offers
primary intended users and their explicit a menu of possibilities within the frame-
commitments to concrete, specific uses. work of established evaluation standards
The evaluator facilitates judgment and de- and principles. While concern about utility
cision making by intended users rather than drives a utilization-focused evaluation, the
acting as a distant, independent judge. evaluator must also attend to the evalu-
Since no evaluation can be value-free, ation's accuracy, feasibility, and propriety
utilization-focused evaluation answers the (Joint Committee on Standards 1994).
question of whose values will frame the Moreover, as a professional, the evaluator
evaluation by working with clearly identi- has a responsibility to act in accordance
fied, primary intended users who have re- with the profession's adopted principles of
sponsibility to apply evaluation findings conducting systematic, data-based inquir-
and implement recommendations. In es- ies; performing competently; ensuring the
sence, I shall argue, evaluation use is too honesty and integrity of the entire evalu-
important to be left to evaluators. ation process; respecting the people in-
Utilization-focused evaluation is highly volved in and affected by the evaluation;
personal and situational. The evaluation and being sensitive to the diversity of inter-
facilitator develops a working relationship ests and values that may be related to the
with intended users to help them determine general and public welfare (AEATask Force
what kind of evaluation they need. This 1995:20; see Exhibit 2.1).

Utilization-focused evaluation does not to illustrate how the philosophy of utiliza-

advocate any particular evaluation content, tion-focused evaluation is translated into
model, method, theory, or even use. practice.
Rather, it is a process for helping primary
intended users select the most appropriate
content, model, methods, theory, and uses The First Challenge:
for their particular situation. Situational Engendering Commitment
responsiveness guides the interactive pro-
cess between evaluator and primary in- Utilization-focused evaluators begin
tended users. This book will present and their interactions with primary intended
discuss the many options now available in users by working to engender commit-
the feast that has become the field of evalu- ments to both evaluation and use. Even
ation. As we consider the rich and varied program funders and decision makers who
menu of evaluation, it will become clear request or mandate an evaluation often
that utilization-focused evaluation can in- don't know what evaluation involves, at
clude any evaluative purpose (formative, least not in any specific way. And they
summative, developmental), any kind of typically haven't thought much about how
data (quantitative, qualitative, mixed), any they will use either the process or the
kind of design (e.g., naturalistic, experi- findings.
mental), and any kind of focus (processes, In working with program staff and ad-
outcomes, impacts, costs, and cost-benefit, ministrators to lay the groundwork for an
among many possibilities). Utilization- evaluation, I often write the word evaluate
focused evaluation is a process for making on a flip chart and ask those present to free-
decisions about these issues in collabora- associate with the word. They typically
tion with an identified group of primary begin with synonyms or closely related
users focusing on their intended uses of terms: assess, measure, judge, rate, com-
evaluation. pare. Soon they move to connotations and
A psychology of use undergirds and in- feelings: waste, crap, cut our funding,
forms utilization-focused evaluation. In es- downsize, attack, demean, put down, pain,
sence, research and my own experience hurt, fear.
indicate that intended users are more likely Clearly, evaluation can evoke strong
to use evaluations if they understand and emotions, negative associations, and genu-
feel ownership of the evaluation process ine fear. To ignore the perceptions, past ex-
and findings; they are more likely to under- periences, and feelings stakeholders bring
stand and feel ownership if they've been to an evaluation is like ignoring a smolder-
actively involved; and by actively involv- ing dynamite fuse in hope it will burn itself
ing primary intended users, the evaluator is out. More likely, unless someone inter-
training users in use, preparing the ground- venes and extinguishes the fuse, it will burn
work for use, and reinforcing the intended faster and eventually explode. Many an
utility of the evaluation every step along the evaluation has blown up in the face of
way. The rest of this chapter will offer some well-intentioned evaluators because they
ways of working with primary intended rushed into technical details and methods
users to begin the process of utilization- decisions without establishing a solid foun-
focused evaluation. Beyond the heuristic dation for the evaluation in clear purposes
value of these examples, they are meant and shared understandings. To begin, both
What Is Utilization-Focused Evaluation? • 23

evaluators and those with whom we work tainment, then, takes too narrow a focus to
need to develop a shared definition of eval- encompass the variety of ways program
uation and mutual understanding about evaluation can be useful.
what the process will involve. Another common definition states that
evaluation determines the worth, merit, or
value of something (Joint Committee on
What Is Program Evaluation? Standards 1994; House 1993:1; Scriven
1991a: 139). This admittedly commonsen-
I offer the clients with whom I work the sical definition omits specifying the basis
following definition: for determining merit or worth (that is,
systematically collected data) or the pur-
Program evaluation is the systematic col- poses for making such a determination
lection of information about the activities, (program improvement, decision making,
characteristics, and outcomes of programs or knowledge generation). In advocating
to make judgments about the program, im- for this narrow and simple definition of
prove program effectiveness, and/or in evaluation, Stufflebeam (1994) warned
form decisions about future programming. against "obscuring the essence of evaluation
Utilization-focused program evaluation (as —to assess value—by overemphasizing its
opposed to program evaluation in general) is constructive uses" (p. 323). However, for
evaluation done for and with specific, in- me, use is the essence, so I choose to include
tended primary users for specific, intended it in my definition as a matter of emphasis
uses. to reinforce the point that concern about
use is a distinguishing characteristic of pro-
gram evaluation, even at the point of defin-
The general definition above has three in-
ing what program evaluation is. I'm not
terrelated components: (1) the systematic
interested in determining merit or worth as
collection of information about (2) a poten-
an end in itself. I want to keep before us the
tially broad range of topics (3) for a variety
questions: Why is merit or worth to be
of possible judgments and uses. The defini-
judged? What will be done with whatever
tion of utilization-focused evaluation adds
judgments are made?
the requirement to specify intended use by
intended users. A different approach is represented by
This matter of defining evaluation is of the widely used Rossi and Freeman (1993)
considerable import because different eval- textbook, Evaluation: A Systematic Ap-
uation approaches rest on different defini- proach. They define evaluation research as
tions. The use-oriented definition offered the systematic application of social re-
above contrasts in significant ways with search procedures in assessing social inter-
other approaches. One traditional ap- vention programs. But notice, they are de-
proach has been to define program evalu- fining evaluation research, and their text
ation as determining the extent to which a emphasizes applying social science meth-
program attains its goals. However, as we ods, so naturally they include that in their
shall see, program evaluation can and does definition of evaluation.
involve examining much more than goal The definition of evaluation I've offered
attainment, for example, implementation, here emphasizes systematic data collection
program processes, unanticipated conse- rather than applying social science meth-
quences, and long-term impacts. Goal at- ods. This is an important distinction in

emphasis, one in keeping with the Princi- boundaries of time, place, values, and poli-
ple of Systematic Inquiry adopted by the tics. The difference between research and
American Evaluation Association (AEA evaluation has been called by Cronbach
Task Force on Guiding Principles 1995:22). and Suppes (1969) the difference between
From my perspective, program evaluators conclusion-oriented and decision-oriented
may use research methods to gather infor- inquiry. Research aims to produce knowl-
mation, but they may also use manage- edge and truth. Useful evaluation supports
ment information system data, program action. The evaluation research of Rossi
monitoring statistics, or other forms of and Freeman is a hybrid that tends, in
sys tematic information that are not research- my reading of it, to be more knowledge-
oriented. Program evaluation differs funda- oriented than action-oriented.
mentally from research in the purpose of data Stake (1981) and Cronbach (1982) have
collection and standards for judging quality. emphasized that evaluation differs from
Basic scientific research is undertaken to dis- research in the relative importance at-
cover new knowledge, test theories, establish tached to making generalizations. In any
truth, and generalize across time and space. data collection effort, the extent to which
Program evaluation is undertaken to inform there is concern about utility, generalizabil-
decisions, clarify options, identify improve- ity, scientific rigor, and relevance of the
ments, and provide information about pro- findings to specific users will vary. Each of
grams and policies within contextual these dimensions is a continuum. Because
What Is Utilization-Focused Evaluation? • 25

this book emphasizes meeting the informa- In short, how to define evaluation and
tion needs of specific intended users, the what to call a particular evaluation are
focus will most often be on program evalu- matters for discussion, clarification, and
ation rather than evaluation research. This negotiation.
focus derives from my work with small, What is not negotiable is that the evalu-
community-based programs where the idea ation be data-based. Both program evalu-
of conducting "research" may be intimidat- ation and evaluation research bring an em-
ing or where practitioners consider re- pirical perspective to bear on questions of
search "academic and irrelevant." On the policy and program effectiveness. This
other hand, national programs or those data-based approach to evaluation stands
staffed or funded by people with advanced in contrast to two alternative and often
degrees may attach positive associations to competing ways of assessing programs: the
conducting research, in which case they charity orientation and pure pork barrel
may prefer to call the process evaluation politics. I sometimes introduce these dis-
research. The language, like everything else tinctions in working with clients to help
in utilization-focused evaluation, depends them more fully appreciate the sine qua non
on the program context and the explicit nature of evaluation's commitment to sys-
needs and values of primary intended users. tematic data collection.

Charitable Assessment

^ f \ nd now abideth faith, hope, chfirity, these three; but the greatest of these is
^y % charity.
—Paul's First Letter to the Corinthians

Modern social service and education Sometimes religious motives can also be
programs are rooted in charitable and phil- found in this mix. As a United Way agency
anthropic motives: helping people. From a director once told me, "God has mandated
charity perspective, the main criterion for our helping the less fortunate, so God alone
evaluation is the sincerity of funders and will judge the outcomes and effectiveness
program staff; the primary measure of pro- of our efforts." The implication was that
gram worth is that the program organizers God needed no assistance from the likes of
care enough to try their very best to help social scientists, with their impersonal
the less fortunate. As an agency director statistics and objective analyses of human
told me after a measurement' training ses- suffering.
sion, "All I want to know is whether or not Data-oriented evaluators have little to
my staff are trying their best. When you've offer those who are fully ensconced in
got a valid and. reliable and all-that-other- charitable assessment. Others, however
stuff instrument for love and sincerity, (and their numbers are increasing), have
come back and see me." come to believe that, even for the sincer^,

indeed especially for the sincere and caring, evaluation findings are of interest only in-
empirically based program evaluations can sofar as they can be manipulated for po-
be valuable. After all, sincerity and caring litical and public relations purposes.
mean that one wants to do a good job, (Chapter 14 will address in more depth
wants to be effective, and wants to make a the relationship, often healthy when prop-
difference. The purpose of program evalu- erly approached, between politics and
ation is precisely that—to increase effec- evaluation.)
tiveness and provide information on
whether hopes are actually being realized.
People who really care about their work are Learning to Value Evaluation
precisely the people who can benefit
greatly from utilization-focused program So, we're working on engendering com-
evaluation. mitment to data-based evaluation and use.
We want to get beyond charitable assess-
ments and pork barrel assessments. Re-
Pork Barrel Assessment search on "readiness for evaluation" (D. S.
Smith 1992; Studer 1978; Mayer 1976,
A second historically important ap- 1975) has found that "valuing evaluation"
proach to evaluating programs has been is a necessary condition for evaluation use
pork barrel politics, which takes as its main (see Exhibit 2.2). Valuing evaluation can-
criterion the political power of a program's not be taken for granted. Nor does it hap-
constituency: If powerful constituents want pen naturally. Users' commitment to evalu-
the program, or if more is to be gained ation is typically fragile, often whimsical,
politically by support for, rather than op- and must be cultivated like a hybrid plant
position to, the program, then the program that has the potential for enormous yields,
is judged worthwhile; no other evidence of but only if properly cared for, nourished,
program effectiveness is needed, although and appropriately managed.
data may be sought to support this prede-
termined political judgment. Pork barrel
evaluations are one reason it is so difficult Reality Testing
to terminate government-funded programs
and agencies. Programs rapidly develop I find the idea of "reality testing" helpful
constituencies whose vested interests lie in in working with intended users to increase
program continuation. The driving force of the value they attach to evaluation and,
the pork barrel approach is to give out correspondingly, their willingness to be ac-
money where it counts politically, not tively engaged in the work necessary to
where it will be used most effectively. make the evaluation useful. I include in the
The pork barrel criterion is not unique notion of testing reality gathering varying
to elected politicians and governmental perceptions of reality in line with the axiom
bodies. The funding boards of philan- that "what is perceived as real is real in its
thropic foundations) corporate boards, and consequences."1 The phrase "reality test-
service agencies have their own constituen- ing" implies that being "in touch with real-
cies to please. Political debts must be paid, ity" can't simply be assumed. When indi-
so programs are judged effective as long as viduals lose touch with reality, they become
they serve powerful interests. Empirical dysfunctional, and, if the distortions of re-
What Is Utilization-Focused Evaluation? • 27

Items on Belief in Program Evaluation,
From Readiness for Evaluation Questionnaire

Rank Order Factor

by Factor Item Loading

1 Program evaluation would pave the way for better programs for our clientele .777
2 This would be a good time to begin (or renew or intensify) work on program
evaluation .732
3 Installing a procedure for program evaluation would enhance the stature of
our organization .723
4 We don't need to have our program evaluated -.689
5 The amount of resistance in the organization to program evaluation should
not be a deterrent to pursuing a policy of program evaluation .688
6 I have yet to be convinced of the alleged benefits of program evaluation -.669
7 Program evaluation would only increase the workload -.668
8 "Program evaluation" and "accountability" are just fads that hopefully will die
down soon -.650
9 Program evaluation would tell me nothing more than I already know -.645
10 I would be willing to commit at least 5% of the program budget for evaluation .624
11 A formal program evaluation would make it easier to convince administrators
of needed changes .617
12 We could probably get additional or renewed funding if we carry out a plan for
program evaluation .587
13 Program evaluation might lead to greater recognition and rewards to those
who deserve it .548
14 It would be difficult to implement a procedure for program evaluation without
seriously disrupting other activities -.518
15 No additional time and money can be made available for program evaluation -.450
16 Most of the objections one hears about program evaluation are really pretty
irrational .442
17 Some money could probably be made available to provide training to staff in
program evaluation skills .385

SOURCE: Smith 1992:53-54).

NOTE: Factor analysis is a statistical technique for identifying questionnaire or test items that are highly intercorrelated and
therefore may measure the same factor, in this case, belief in evaluation. The positive or negative signs on the factor loadings
reflect whether questions were worded positively or negatively; the higher a factor loading, the better the item defines the factor.

Reality Testing: Example of
a Good Idea That Didn't Work Out

The Robert Wood Johnson Foundation funded an eight-year effort to establish and evaluate new ways
of helping doctors and patients deal with death in hospitals. Called SUPPORT (Study to Understand
Prognoses and Preferences for Outcomes and Risks of Treatment), the project placed nurses in five
teaching hospitals to facilitate communications between physicians and families facing the death of
a family member. The idea was that by increasing doctors' understanding of what patients and their
families wanted and didn't want, pain could be diminished, the appropriateness of care would increase,
and fewer "heroic measures" would be used to prolong life for short periods.
The evaluation found that the culture of denial about death could not be overcome through better
communication. Major gaps remained between what patients privately said they wanted and what
doctors, dedicated to saving lives, did. Living wills didn't help. Half the patients still died in pain. Many
died attached to machines, and died alone.
Dr. Joanne Lynn, a co-director of the project, expressed dramatically the importance of testing
good ideas in practice to see if they really work: "We did what everyone thought would work and it
didn't work at all, not even a quiver."
While the idea didn't work, important lessons were learned, she concluded. "This wasn't a group
of doctors dedicated to finding the last possible date on the tombstone. What we learned was that the
conspiracy of silence about death was stronger than we expected, and the force of habit was also
stronger than we expected. We are all involved in the dance of silence." ,

NOTE: Quotations attributed to Dr. Lynn are taken from Goodman 1995:17A.

ality are severe, they may be referred for liefs. Evaluation is a threat to such people.
psychotherapy. Programs and organiza- Evaluators who ignore the threatening
tions can also "lose touch with reality" in nature of reality testing and plow ahead
the sense that the people in those programs with their data collection in the hope
and organizations are operating on myths that knowledge will prevail are engaged
and behaving in ways that are dysfunctional in their own form of reality distortion.
to goal attainment and ineffective for ac- Utilization-focused evaluators, in contrast,
complishing desired outcomes. Program work with intended evaluation users to
evaluation can be a mechanism for finding help them understand the value of reality
out whether what's supposed to be or testing and buy into the process, thereby
hoped to be going on is, in fact, going reducing the threat of evaluation and resis-
on—a form of reality testing. tance (conscious or unconscious) to evalu-
Some people would just as soon not be ation use. One way to do this is to look for
bothered dealing with programmatic or or- and use examples from the news of good
ganizational reality. They've constructed ideas that haven't worked out. Exhibit 2.3
their own comfortable worlds built on un- presents an example I've used with several
tested assumptions and unexamined be- groups.
What Is Utilization-Focused Evaluation? • 29

As I work with intended users to agree Because evaluation use is so dependent

on what we mean by evaluation and engen- on the commitment to reality testing,
der a commitment to use, I invite them to evaluators need ways to cultivate that com-
assess incentives for and barriers to reality mitment and enlarge the capacity of in-
testing and information use in their own tended users to undertake the process. This
program culture. Barriers typically include means engaging program staff, managers,
fear of being judged, cynicism about funders, and other intended users in
whether anything can really change, skep- examining how their beliefs about program
ticism about the worth of evaluation, con- effectiveness may be based on selective per-
cern about the time and money costs of ception, predisposition, prejudice, rose-
evaluation, and frustration from previous colored glasses, unconfirmed assertions, or
bad evaluation experiences, especially lack simple misinformation. The irony of living
of use. As we work through these and in the information age is that we are sur-
related issues to "get ready for evaluation," rounded by so much misinformation and
the foundation for use is being built in act on so many untested assumptions. By
conjunction with a commitment to serious putting intended users in touch with how
and genuine reality testing. Because evalua- little they really know, and how flimsy is
the basis for much of what they think they
tors have typically internalized the value of
know, we are laying the groundwork for
data-based reality testing, it is easy to as-
use. We are, in fact, identifying that there
sume that others share this perspective. But
are useful things to be found out and cre-
a commitment to examine beliefs and test
ating the expectation that testing reality
actual goal attainment is neither natural
will be a valuable activity, not just an aca-
nor widespread. People involved in pro-
demic or mandated exercise. In short, we
gram management and service delivery can
are establishing the program's readiness for
become quite complacent about what
they're doing and quite content with the
way things are. Reality testing will only
^upset things. "Why bother?" they ask.
Generating Real Questions
Nor is it enough that an evaluation is
required by some funder or oversight One way of facilitating a program's
authority. Indeed, under such conditions, readiness for evaluation is to take primary
evaluation often becomes an end in itself, intended users through a process of gener-
something to be done because it is man- ating meaningful evaluation questions. I
dated, not because it will be useful or be- find that when I enter a new program
cause important things can be learned. Do- setting as an external evaluator, the people
ing an evaluation because it is required is with whom I'm working typically expect
entirely different from doing it because one me to tell them what the focus of the
is committed to grounding decisions and evaluation will be. They're passively wait-
action in a careful assessment of reality. ing to be told by the evaluation expert—
Ironically, mandated evaluations can actu- me—what questions the evaluation will an-
ally undercut utility by making the reason swer. But I don't come with specific
for the evaluation compliance with a fund- evaluation questions. I come with a process
ing requirement rather than genuine inter- for generating their questions. Taking them
est in being more effective. through that process is aimed at engender-

ing their commitment to data-based evalu- I again replied that it was too early to
ation and use. Let me share an example. talk about instruments. First, we had to
The Frontier School Division in Mani- identify their evaluation questions and con-
toba, Canada, encompasses much of north- cerns. Then we would talk about instru-
ern Manitoba—a geographically immense ments. However, their folded arms and
school district. The Deputy Minister of scowling faces told me that what they in-
Education in Manitoba thought evaluation terpreted as my evasiveness was only inten-
might be a way to shake things up in a sifying their initial suspicions and fears. I
district he considered stagnant, so he asked was deepening their resistance by what they
me to facilitate an evaluation process with perceived as my secretiveness about the
content of my evaluation scheme. The su-
district officials. The actual form and con-
perintendent tried again: "How about just
tent of the evaluation were to be deter-
showing us one part of the evaluation, say
mined internally, by them. So I went up to
the part that asks teachers about adminis-
Winnipeg and met with the division admin-
trative effectiveness."
istrators, a representative from the parents'
group, a representative from the principals' At that point I was about to throw in the
group, and a representative from the teach- towel, give them some old instruments, and
ers' union. I had asked that all constituen- let them use what they wanted from other
evaluations. But first, I made one more
cies be represented in order to establish
attempt to get at their issues. I said, "Look,
credibility with all the people who might be
maybe your questions will be the same as
involved in using the evaluation.
questions I've used on surveys elsewhere.
Inasmuch as I had been brought in from But I'm not even sure at this point that any
outside by a superordinate official, it was kind of survey is appropriate. Maybe you
not surprising that I encountered reactions don't need an evaluation. I certainly don't
ranging from defensiveness to outright hos- have any questions I need answered about
tility. They had not asked for the evalu- your operations and effectiveness. Maybe
ation, and the whole idea sounded unsa- you don't either. In which case, I'll tell the
vory and threatening. Deputy Minister that evaluation isn't the
I began by asking them to tell me what way to go. But before we decide to quit, let
kinds of things they were interested in me ask you to participate in a simple little
evaluating. The superintendent frowned exercise. It's an old complete-the-blank ex-
and responded, "We'd like to see the evalu- ercise from grade school." I then turned to
ation instruments you've used in assessing the chalkboard and wrote a sentence in
other school districts." capital letters.
I replied that I would be happy to share
such instruments if they should prove rele- I WOULD REALLY LIKE TO KNOW
vant, but it would be helpful to first deter- ABOUT FRONTIER
mine the evaluation issues and priorities of SCHOOL DIVISION.
Frontier School Division. They looked
skeptical, and after a lingering silence, the I turned back to them and continued,
superintendent tried again: "You don't "I want to ask each of you, individually,
need to show us all the instruments you to complete the blank 10 times. What are
intend to use. Just show us one so we have 10 things about Frontier School Division
an idea of what's going to happen." that you'd like to know, things you aren't
What Is Utilization-Focused Evaluation? • 31

certain about, that would make a differ- Another question concerned the rela-
ence in what you do if you had more tionship between the classroom and the
information? Take a shot at it, without community. Both the teacher and parent
regard to methods, measurement, design, representatives said that nobody had ever
resources, precision—just 10 basic ques- thought about that in any real way: "We
tions, real questions about this division." don't have any policy about that. We don't
After about 10 minutes I divided the know what goes on in the different schools.
participants into three groups of four peo- That would be important for us to know."
ple each and asked them to combine their We spent the rest of the day refining
lists into a single list of 10 things that each questions, prioritizing, formalizing evalu-
group wanted to know—in effect, to estab- ation procedures, and establishing an
lish each group's priority questions. Then agenda for the evaluation process. The hos-
we pulled back together and generated a tility had vanished. By the end of the day
single list of 10 basic evaluation questions they were anxious to have me make a com-
—answers to which, they agreed, could mitment to return. They had become ex-
make a real difference to the operations of cited about doing their evaluation. The
Frontier School Division. evaluation had credibility because the ques-
The questions they generated were the tions were their questions. A month later,
kind an experienced evaluator could antici- they found out that budget shifts in the
pate being asked in a districtwide educa- Ministry meant that the central govern-
tional evaluation because there are only so ment would not pay for the evaluation. The
many things one can ask about a school Deputy Minister told them that they could
division. But the questions were phrased in scrap the evaluation if they wanted to, but
their terms, incorporating important local they decided to pay for it out of local
nuances of meaning and circumstance. division funds.
Most important, they had discovered that The evaluation was completed in close
they had questions they cared about—not cooperation with the task force at every
my questions but their questions, because step along the way. The results were dis-
during the course of the exercise it had seminated to all principals and teachers.
become their evaluation. The whole atmos- The conclusions and recommendations
phere had changed. This became most evi- formed the basis for staff development con-
dent as I read aloud the final list of 10 items ferences and division policy sessions. The
they had generated that morning. One item evaluation process itself had an impact on
read, "How do teachers view the effective- the Division. Over the last several years,
ness of administrators and how often do Frontier School Division has gone through
they think administrators ought to come many changes. It is a very different place in
into classrooms?" One of the administra- terms of direction, morale, and activity
tors who had been most hostile at the outset than it was on my first visit. Not all those
said, "That would be dynamite informa- changes were touched on in the evaluation,
tion. We have no idea at all what teachers nor are they simply a consequence of the
think about us and what we do. I have no evaluation. But generating a list of real and
idea if they want me in their classrooms or meaningful evaluation questions played a
not, or how often they think I ought to visit. critical part in getting things started. Ex-
That could turn my job around. That hibit 2.4 offers criteria for good utilization-
would be great to know." focused questions.

Criteria for Utilization-Focused Evaluation Questions

1. Data can be brought to bear on the question; that is, it is truly an empirical question.
2. There is more than one possible answer to the question; that is, the answer is not predetermined by the
phrasing of the question.
3. The primary intended users want information to help answer the question. They care about the answer to
the question.
4. The primary users want to answer the question for themselves, not just for someone else.
5. The intended users can indicate how they would use the answer to the question; that is, they can specify
the relevance of an answer to the question for future action.

Communicating Professional explore how this "baggage" they've

Commitment to Use From the brought with them may affect their expec-
Beginning tations about the evaluation's likely utility.
As we work toward a shared definition of
The criterion I offered the primary in- evaluation and a clear commitment to use,
tended users in Winnipeg for generating I look for opportunities to review the de-
meaningful questions was "Things you'd velopment of program evaluation as a field
like to know that would make a difference of professional practice and present the
to what you do." This criterion emphasizes standards for and principles of evaluation
knowledge for action—finding out things (see the index). This material, presented
that can be used. But generating a list of earlier in this chapter and in Chapter 1,
potentially useful questions is only one way communicates to primary intended users
to start interacting with primary users. that you, as the evaluator, are a profes-
How one begins depends on what back- sional—part of an established profession—
grounds, experiences, preconceptions, and and that, as such, you have an obligation to
relationships the primary users bring to the facilitate and conduct evaluations in ac-
table. In Winnipeg, I needed to get the cordance with professional standards and
group engaged quickly in reframing how principles, including priority attention to
they were thinking about my role because utility.
their resistance was so palpable and be- Few non-evaluators are aware of the
cause we didn't have much time. field's professional associations, confer-
With a seemingly more neutral group, ences, journals, standards, and principles.
one that is neither overtly hostile nor en- By associating your effort with the larger
thusiastic (Yes, some groups are actually profession, you can elevate the status, seri-
enthusiastic at the beginning!), I may begin, ousness, and meaningfulness of the process
as I noted earlier in this chapter, by asking you are facilitating and help the primary
participants to share words and feelings intended users understand the sources of
they associate with evaluation. Then, we wisdom you are drawing oh and applying
What Is Utilization-Focused Evaluation? • 33

Themes of Annual American
Evaluation Association National Conferences

1986 What Have We Learned?

1987 The Utilization of Evaluation
1988 Evaluation and Politics
1989 International and Cross-Cultural Perspectives
1990 Evaluation and Formulation of Public Policy
1991 New Evaluation Horizons
1992 Synthesizing Evaluation: Perspectives, Practices, and Evidence
1993 Empowerment Evaluation
1994 Evaluation and Social Justice
1995 Evaluation for a New Century: A Global Perspective
1996 A Decade of Progress: Looking Back and Looking Forward
1997 Evaluation Theory Informing Practice, Practice Informing Theory

as you urge them to attend carefully to this so important that I have students prac-
utilization issues from the start. Thus, the tice 10-minute minilectures on the devel-
history of the profession presented in the opment of evaluation as a field of profes-
first chapter can be shared with intended sional practice, one guided by standards
users to communicate the larger context and principles (see Exhibits 1.3 and 2.1), so
within which any particular evaluation they can hold forth at a moment's notice,
takes place and to show sophistication whether the opportunity be a workshop or
about the issues the profession has focused a cocktail party.
on over time (see Exhibit 2.5). I consider

Creative Beginnings

M—-X uthors of all races, be they Greeks, Romans, Teutons, or Celts, can't seem just
^y % to say that anything is the thing it is; they have to go out of their way to say
that it is like something else.
—Ogden Nash

With easy-going, relaxed groups that begin with a metaphor exercise. Meta-
seem open to having some fun, I'll often phors, similes, and analogies help us make

connections between seemingly uncon- to ask them to construct metaphors and

nected things, thereby opening up new pos- similes about evaluation. This exercise
sibilities by unveiling what had been unde- helps participants in the process discover
tected. Bill Gephart (1981), in his 1980 their own values concerning evaluation
presidential address to evaluators, drew an while also giving them a mechanism to
analogy between his work as a watercolor communicate those values to others. T h e
artist and his w o r k as an evaluator. Gephart exercise can be used with a program staff,
compared the artist's efforts to "compel the an evaluation task force, evaluation train-
eye" to the evaluator's efforts to "compel ees, workshop participants, or any group
the mind." Both artist and evaluator at- for w h o m it might be helpful to clarify and
tempt to focus the attention of an audience share perceptions about evaluation. T h e
by highlighting some things and keeping exercise goes like this.
other things in the background. H e also
examined the ways in which the values of One of the things that we'll need to do during
an audience (of art critics or program deci- the process of working together is come to
sion makers) affect what they see in a fin- some basic understandings about what evalu-
ished piece of work. ation is and can do. In my experience,
Nick Smith (1981) directed a Research evaluation can be a very creative and ener-
on Evaluation Program in which he and gizing experience. In particular, interpreting
others thought about evaluators as poets, and using evaluation findings for program
architects, photographers, philosophers, improvement requires creativity and open-
operations analysts, and artists. They con- ness to a variety of possibilities. To help us
sciously and creatively used metaphors and get started on this creative endeavor, I'm
analogies to understand and elaborate the going to ask you to participate with me in a
many functions of program evaluation. Use little exercise.
of these forms of figurative speech can help In this box I have a bunch of toys, house-
evaluators communicate the nature and hold articles, office supplies, tools, and other
practice of evaluation. M a n y of the prob- miscellaneous gadgets and thingamajigs that
lems encountered by evaluators, much of I've gathered from around my house. I'm
the resistance to evaluation, and many fail- going to dump these in the middle of the
ures of use occur because of misunder- table and ask each of you to take one of them
standings and communications problems. and use that item to make a statement about
What we often have, between evaluators evaluation. Evaluation is like
and non-evaluators, is a classic "failure to because . . .
One reason for such failures is that the T o illustrate what I w a n t people to d o ,
language of research and evaluation—the I offer to go first. I ask someone t o pick
jargon—is alien to many laypersons, deci- out any object in the r o o m that I might use
sion makers, and stakeholders. From my for my metaphor. W h a t follows are some
point of view, the burden for clear commu- examples from actual workshops:
nications rests on the evaluator. It is the
evaluator w h o must find ways of bridging Someone points to a coffee cup: "This cup
the communications gap. can be used to hold a variety of things. The
To help intended users and stakeholders actual contents of the cup will vary depend-
understand the nature of evaluation, I like ing on who is using it and for what purpose
What Is Utilization-Focused Evaluation? • 35

they are using it. Utilization-focused evalu- ation statement. Below are some examples
ation is a process like this cup; it provides a from actual w o r k s h o p s .
form but is empty until the group of people
working on the evaluation fill it with focus
This empty grocery bag is symbolic of my
and content and substance. The potential of
feelings about evaluation. When I think
the cup cannot be realized until it holds some
about our program being evaluated, I want
liquid. The potential of utilization-focused
to find someplace to hide, and I can put this
evaluation cannot be realized until it is given empty bag over my head so that nobody can
the substance of a concrete evaluation prob- see me and I can't see anything else, and it
lem and situation. One of the things that I'll gives me at least the feeling that I'm able to
be doing as we work together is providing an hide. (She puts the bag over her head.)
evaluation framework like this cup. You will
provide the substance."
Evaluation can be like this toothbrush. When
used properly it gets out the particles be-
Someone points to a chalkboard: "Evalu- tween the teeth so they don't decay. If not
ation is like a chalkboard because both are used properly, if it just lightly goes over the
tools that can be used to express a variety of teeth or doesn't cover all the teeth, then
different things. The chalkboard itself is just some of the gunk will stay on and cause the
an empty piece of slate until someone writes teeth to decay. Evaluation should help get rid
on it and provides information and meaning of any things that are causing a program to
by filling in that space. The chalkboard can decay so it stays healthy.
be filled up with meaningless figures, ran-
dom marks, obscene words, mathematical
Evaluation for me is like this rubber ball.
formulas, or political graffiti—or the board
You throw it down and it comes right back
can be filled with meaningful information,
at you. Every time I say to my staff we
insights, helpful suggestions, and basic facts.
ought to evaluate the program, they throw it
The people who write on the chalkboard
right back at me and they say, "you do the
carry the responsibility for what it says. The
people who fill in the blanks in the evaluation
and determine its content and substance
carry the responsibility for what the evalu- Evaluation is like this camera. It lets you take
ation says. The evaluation process is just a a picture of what's going on, but it can only
tool to be used—and how it is used will de- capture what you point it at, and only at a
pend on the people who control the process particular point in time. My concern about
—in this case, you." this evaluation is that it won't give the whole
picture, that an awful lot may get left out.
I'll typically take a break at this point
and give people about 10 minutes to select Evaluation for me is like this empty enve-
an item and think about w h a t to say. If lope. You can use it to send a message to
there are more than 10 people in the someone. I want to use evaluation to send a
group, I will break the larger group into message to our funders about what we're
small groups of 5 or 6 for sharing analo- doing in the program. They don't have any
gies and metaphors so that each person is idea about what we actually do. I just hope
given an opportunity to make an evalu- they'll read the letter when they get it.

Evaluation for me is like this adjustable one who couldn't find an object to use in
wrench. You can use this wrench to tighten saying something about evaluation. One
nuts and bolts to help hold things together. way of guaranteeing this is to include in
If used properly and applied with the right your box of items some things that have a
amount of pressure, it holds things together pretty clear and simple message. For exam-
very well. If you tighten the bolt too hard, ple, I'll always include a lock and key so
however, you can break the bolt, and the that a very simple and fairly obvious anal-
whole thing will fall apart. I'm in favor of ogy can be made: "Evaluation is like a lock
evaluation if it's done right. My concern is and key, if you have the right key you can
that you can overdo it and the program can't open up the lock and make it work. If you
handle it. have the right information you can make
the thing work." Or I'll include a lightbulb
The process of sharing is usually ac- so that someone can say "evaluation is like
companied by laughter and spontaneous this lightbulb, it's purpose is to shed light
elaborations of favorite metaphors. It's a on the situation."
fun process that offers hope the evalu-
ation process itself may not be quite as
painful as people thought it would be. In The Cutting Edge of Metaphors
addition, participants are often surprised
to find that they have something to say. Metaphors can open up new under-
They are typically quite pleased with standings and enhance communications.
themselves. Most important, the exercise They can also distort and offend. At the
serves to express important thoughts and 1979 meeting of the Midwest Sociological
feelings that can be dealt with once they Society, well-known sociologist Morris
are made explicit. Janowitz was asked to participate in a panel
Participants are typically not even aware on the question "What is the cutting edge
that they have these feelings. By providing of sociology?" Janowitz (1979), having
a vehicle for discovering and expressing written extensively on the sociology of the
their concerns, it is possible to surface ma- military, took offense at the "cutting edge"
jor issues that may later affect evaluation metaphor. He explained, " 'Cutting edge'
use. Shared metaphors can help establish a is a military term. I am put off by the very
common framework for the evaluation, term, cutting edge, like the parallel term
capturing its purpose, its possibilities, and breakthrough: slogans which intellectuals
the safeguards that need to be built into the have inherited from the managers of vio-
process. Robert Frost once observed, "All lence" (p. 601).
thought is a feat of association: Having Strategic planning is a label with military
what's in front of you bring up something origins and connotations, as is rapid recon-
in your mind that you almost didn't know naissance, a phrase sometimes used to de-
you knew." This exercise helps participants scribe certain quick, exploratory evalu-
bring to mind things about evaluation they ation efforts. Some stakeholder groups will
almost didn't know they knew. object to such associations; others will re-
By the way, I've used this exercise with late positively. Evaluators, therefore, must
many different groups and in many differ- be sensitive in their selection of metaphors
ent situations, including cross-cultural set- to avoid offensive comparisons and match
tings, and I've never yet encountered some- analogies to stakeholders' interests. Of par-
What Is Utilization-Focused Evaluation? • 37

ticular importance, in this regard, is avoid- smorgasbord banquet styles of teaching/

ing the use of metaphors with possible learning/assessing. Many new metaphors are
racist and sexist connotations, for example, needed as we seek clarity in our search for
"It's black and white" or "We want t o get better ways of evaluating. To deal with diver-
inside the Black Box of evaluation." sity is to look for new metaphors. (Hurry
As Minnich (1990) has observed in her 1976)
important book, Transforming Knowledge,
our language and thinking can perpetuate As we look for new m e t a p h o r s in
"the old exclusions and devaluations of the evaluation, we w o u l d d o well to d o so
majority of humankind that have pervaded in the spirit of T h o r e a u , w h o observed,
our informal as well as formal schooling" "All perception of truth is the detec-
(p. 1). She observed further that tion of an analogy." T h e added point for
utilization-focused evaluators is the ad-
even when we are all speaking the same monition to be sensitive in selecting meta-
languages, there are many "languages" at phors that are meaningful to specific in-
play behind and within what the speakers tended users. T h e importance of such
mean and what we in turn understand . . . , sensitivity stems from the centrality of
levels and levels of different meanings in "the personal factor" in evaluation use,
even the most apparently simple and acces- the subject of the next chapter. First, how-
sible utterance, (p. 9) ever, a closing m e t a p h o r .

Minnich's point was nicely illustrated

at a conference on educational evalua-
tion w h e r e a W o m e n ' s Caucus formed Navigating Evaluation's
to express concerns about the analogies Rough Seas
used in evaluation and to suggest some
alternatives. A common error made by novice evalua-
tors is believing that because someone has
To deal with diversity is to look for new requested an evaluation or some group has
metaphors. We need no new weapons of been assembled to design an evaluation, the
assessment—the violence has already been commitment to reality testing and use is
done! How about brooms to sweep away the already there. Quite the contrary, these
attic-y cobwebs of our male/female stereo- commitments must be engendered (or revi-
types? The tests and assessment techniques talized if once they were present) and then
we frequently use are full of them. How reinforced throughout the evaluation pro-
about knives, forks, and spoons to sample the cess. Utilization-focused evaluation makes
feast of human diversity in all its richness and this a priority.
color? Where are the techniques that assess It's all too easy for those of us trained in
the deliciousness of response variety, inde- research methods to forget that "evaluation
pendence of thought, originality, unique- is an unnatural act." (Buttons and bumper
ness? (And lest you think those are female stickers with this slogan evoke interesting
metaphors, let me do away with that myth— responses from intended users.) Evaluation
at our house everybody sweeps and every- is not natural to managers, funders, policy-
body eats!) Our group talked about another makers, program staff, or program par-
metaphor—the cafeteria line versus the ticipants. That's why they need profes-

sional assistance, support, training, and matic disasters and solidly documentable
facilitation. successes, and an abundance of ambiguity
Utilization-focused evaluation offers a between these poles of the h u m a n experi-
philosophical harbor to sail toward when ence. The voyage is w o r t h taking, despite
the often rough and stormy seas of evalu- the dangers and difficulties, because the
ation threaten to blow the evaluator off potential rewards include making a mean-
course. With each new evaluation, the ingful difference in the effectiveness of im-
evaluator sets out, like an ancient explorer, portant programs and thereby improving
on a quest for useful knowledge, not sure the quality of people's lives. T h a t only
whether seas will be gentle, tempestuous, happens, however, if the evaluation process
or becalmed. Along the way the evaluator and findings are used.
will often encounter any number of chal-
lenges: political intrigues w r a p p e d in man-
tles of virtue; devious and flattering an-
tagonists trying to co-opt the evaluation in Note
service of their own narrow interests and
agendas; unrealistic deadlines and absurdly 1. I want to emphasize that I am using the
limited resources; gross misconceptions term reality testing in its commonsense conno-
about what can actually be measured with tation of finding out what is happening. While
precision and definitiveness; deep-seated philosophers of science will rightly point out
fears about the evils-incarnate of evalu- that the whole notion of reality is an episte-
ation, and therefore, evaluators; incredible mological quagmire, I find that the people I
exaggerations of evaluators' power; and work with in the "real world"—their phrase—
insinuations about defects in the evalu- resonate to the notion of reality testing. It is
ator's genetic heritage. The observant their own sense of reality I want to help them
evaluator is also likely to encounter tre- test, not some absolute, positivist construct of
mendously dedicated staff working under reality. The notion that reality is socially con-
difficult conditions for pitiable wages; pro- structed doesn't mean it can't be tested and
gram participants w h o have suffered griev- understood. At the 1995 International Evalu-
ous misfortunes and whose lives seem to ation Conference in Vancouver, Ernie House,
hang by the most fragile of threads; admin- Will Shadish, Michael Scriven, and I (evalua-
istrators working feverishly to balance in- tion theorists with quite different perspectives)
credible needs against meager resources; participated in a session on theory in which we
funders and policymakers struggling to agreed on the following two propositions, among
make sensible and rational decisions in a others: (1) Most theorists postulate a real physi-
world that often seems void of sense and cal world, although they differ greatly as to its
reason. The seas of evaluation offer en- knowability and complexity; and (2) logical
counters with discouraging corruption and positivism is an inadequate epistemology that
few theorists advocate any more, either in evalu-
inspiring virtue, great suffering and hope-
ation or philosophy.
ful achievements, unmitigated program-
Fostering Intended Use by
Intended Users
The Personal Factor

IT I here are five key variables that are absolutely critical in evaluation use. They - -
\*^ are, in order of importance: people, people, people, people, and people.

On a damp summer morning at Snow Mountain Ranch near Rocky Mountain National
Park, some 40 human service and education professionals have gathered from all over the
country in a small, dome-shaped chapel to participate in an evaluation workshop. The
session begins like this:

Instead of beginning by my haranguing you about what you should do in program

evaluation, n.e're going to begin with an evaluation exercise to immerse us immediately
in the process. I'm going to ask you to play the dual roles of participants and evaluators
since that '< the situation most of you find yourselves in anyway in your own agencies
and programs, where you have both program and evaluation responsibilities. We're
going to share an experience to loosen things up a bit. . . perhaps warm you up, wake
you up, and allow you to get more comfortable. The exercise will also allow us to test
your participant observer skills and provide us with a common experience as evaluators.
We'll also generate some personal data about the process of evaluation that we can use
for discussion later.


So what I want you to do for the next five minutes is move around this space in any
way you want to. Explore this environment. Touch and move things. Experience
different parts of this lovely setting. And while you're observing the physical environ-
ment, watch what others do. Then, find a place where you feel comfortabl&io write
down what you observed, and also to evaluate the exercise. Experience, explore, observe,
and evaluate. That's the exercise.

At the end of the writing time, participants shared, on a voluntary basis, what they had

First Observer: People slowly got up. Everybody looked kind of nervous 'cause they
weren't sure what to do. People moved out toward the walls, which
are made of rough wood. The lighting is kind of dim. People sort of
moved counterclockwise. Every so often there would be a nervous
smile exchanged between people. The chairs are fastened down in
rows so it's hard for people to move in the center of the room. A
few people went to the stage area, but most stayed toward the back
and outer part. The chairs aren't too comfortable, but it's a quiet,
mellow room. The exercise showed that people are nervous when
they don't know what to do.
Second Observer: The room is hexagon-shaped with a dome-shaped ceiling. Fastened-
down chairs are arranged in a semicircle with a stage in front that
is about a foot high. A podium is at the left of the small stage. Green
drapes hang at the side. Windows are small and triangular. The floor
is wood. There's a coffee table in back. Most people went to get
coffee. A couple people broke the talking rule for a minute. Every-
one returned to about the same place they had been before after
walking around. It's not a great room for a workshop, but it's OK.
Third Observer: People were really nervous about what to do because the goals of
the exercise weren't clear. You can't evaluate without clear goals so
people just wandered around. The exercise shows you can't evaluate
without clear goals.
Fourth Observer: I said to myself at the start, this is a human relations thing to get us
started. I was kind of mad about doing this because we've been here
a half hour already, and we haven't done anything that has to do
with evaluation. I came to learn about evaluation, not to do touchy-
feely group stuff. So I just went to get coffee. I didn't like wasting
so much time on this.
Fostering Intended Use by Intended Users • 41

Fifth Observer: I felt uneasy, but I told myself that it's natural to feel uneasy when
you aren't sure what to do. But I liked walking around, looking at
the chapel, and feeling the space. I think some people got into it,
but we were stiff and uneasy. People avoided looking at each other.
Sometimes there was a nervous smile when people passed each
other, but by kind of moving in a circle, most people went the same
direction and avoided looking at each other. I think I learned
something about myself and how I react to a strange, nervous

These observations were followed by a will decide these issues? The utilization-
discussion of the different perspectives focused answer is: primary intended users
reported on the same experience and of the evaluation.
speculation on what it would take to pro- Clearly and explicitly identifying peo-
duce a more focused set of observations ple who can benefit from an evaluation is
and evaluations. Suggestions included es- so important that evaluators have adopted
tablishing clear goals; specifying evalu- a special term for potential evaluation
ation criteria; figuring out what was sup- users: stakeholders. This term has been bor-
posed to be observed in advance so rowed from management consulting,
everyone could observe it; giving clearer where it was coined in 1963 at the Stanford
directions of what to do; stating the pur- Research Institute as a way of describing
pose of evaluation; and training the evalu- people who were not directly stockholders
ation observers so that they all recorded in a company but "without whose support
the same thing. the firm would cease to exist" (Mendelow
Further discussion revealed that before 1987:177).
any of these evaluation tasks could be com-
pleted, a prior step would be necessary: Stakeholder management is aimed at proac-
determining who the primary intended us- tive action—action aimed, on the one hand,
ers of the evaluation are. This task consti- at forestalling stakeholder activities that could
tutes the first step in utilization-focused adversely affect the organization and on the
evaluation. Taking this first step is the focus other hand, at enabling the organization to
of this chapter. take advantage of stakeholder opportuni-
ties. . . This can be achieved only through a
conscious decision to adopt the stakeholder
The First Step in Utilization- perspective as part of a strategy formulation
Focused Evaluation process. (Mendelow 1987:177-78)

Many decisions must be made in any Evaluation stakeholders are people

evaluation. The purpose of the evaluation who have a stake—a vested interest—in
must be determined. Concrete evaluative evaluation findings. For any evaluation,
criteria for judging program success will there are multiple possible stakeholders:
usually have to be established. Methods program funders, staff, administrators,
will have to be selected and time lines and clients or program participants. Oth-
agreed on. All of these are important issues ers with a direct, or even indirect, interest
in any evaluation. The question is: Who in program effectiveness may be consid-


ered stakeholders, including journalists ferent things in part because they were
and m e m b e r s of the general public, or interested in different things. They "evalu-
more specifically, taxpayers, in the case of ated" the exercise in different ways, and'
public programs. Stakeholders include many had trouble "evaluating" the exercise
anyone w h o makes decisions or desires at all, in part because they didn't k n o w for
information about a program. H o w e v e r , w h o m they were writing. There were sev-
stakeholders typically have diverse a n d eral potential users of an evaluation of the
often competing interests. N o evaluation "explore the environment" exercise:
can answer all potential questions equally
well. This means that some process is nec- 1. As a workshop leader, I might want to evalu-
essary for n a r r o w i n g the range of possible ate the extent to which the exercise accom-
questions to focus the evaluation. In utili- plished my objectives.
zation-focused evaluation, this process be- 2. Each individual participant might conduct a
gins by narrowing the list of potential personal evaluation according to his or her
stakeholders to a much shorter, more spe- own criteria.
cific group of primary intended users. 3. The group could establish consensus goals
Their information needs, that is, their in- for the exercise, which would then serve as
tended uses, focus the evaluation. focus for the evaluation.
The workshop exercise that opened this 4. The bosses, agency directors, and/or funding
chapter illustrates the importance of clearly boards who paid for participants to attend
identifying primary intended users. The might want an assessment of the return on
participants in that exercise observed dif- the resources they have invested for training.
Fostering Intended Use by Intended Users • 43

5. The Snow Mountain Ranch director might truism is regularly and consistently ignored
want an evaluation of the appropriateness in the design of evaluation studies. To tar-
of the chapel for such a workshop. get an evaluation at the information needs
6. The building architects might want an of a specific person or a group of identifi-
evaluation of how participants responded able and interacting persons is quite differ-
to the space they designed. ent from what has been traditionally rec-
7. Professional workshop facilitators might ommended as "identifying the audience"
want to evaluate the exercise's effectiveness for an evaluation. Audiences are amor-
for opening a workshop. phous, anonymous entities. Nor is it suffi-
8. Psychologists or human relation trainers cient to identify an agency or organization
might want to assess the effects of the as a recipient of the evaluation report. Or-
exercise on participants. ganizations are an impersonal collection of
9. Experiential learning educators might want hierarchical positions. People, not organi-
an assessment of the exercise as an experi- zations, use evaluation information. I shall
ential learning tool. elaborate these points later in this chapter.
10. The janitors of the chapel might want an First, I want to present data from a study
evaluation of the work engendered for of how federal health evaluations were
them by an exercise that permits moving used. Those findings provide a research
things around (which sometimes occurs foundation for this first step in utilization-
to destructive proportions when I've used focused evaluation. In the course of pre-
the exercise in settings with moveable senting these data, it will also become
furniture). clearer how one identifies primary in-
tended users and why they are the key to
This list of people potentially inter- specifying and achieving intended uses.
ested in the evaluation (stakeholders)
could be expanded. The evaluation ques-
tion in each case would likely be differ- Studying Use: Identification
ent. I would have different evaluation in- of the Personal Factor
formation needs as workshop leader than
would the camp director; the architects' In the mid-1970s, as evaluation was
information needs would differ from the emerging as a distinct field of professional
janitors' "evaluation" questions; the eval- practice, I undertook a study with col-
uation criteria of individual participants leagues and students of 20 federal health
would differ from those reached by the evaluations to assess how their findings had
total group through a consensus-forma- been used and to identify the factors that
tion process. affected varying degrees of use. We inter-
viewed the evaluators and those for whom
the evaluations were conducted. 1 That
Beyond Audience study marked the beginning of the formu-
lation of utilization-focused evaluation
The preceding discourse is not aimed at presented in this book.
simply making the point that different peo- We asked respondents to comment on
ple see things differently and have varying how, if at all, each of 11 factors extracted
interests and needs. I take that to be on the from the literature on utilization had af-
order of a truism. The point is that this fected use of their study. These factors were

methodological quality, methodological matic activity and thereby enhance their

appropriateness, timeliness, lateness of re- own discretion as decision makers, policy-
port, positive or negative findings, surprise makers, consumers, program participants,
of findings, central or peripheral program and funders, or whatever role they play.
objectives evaluated, presence or absence These are the primary users of evaluation.
of related studies, political factors, decision
maker/evaluator interactions, and re-
sources available for the study. Finally, we Data on the Importance
asked respondents to "pick out the single of the Personal Factor
factor you feel had the greatest effect on
how this study was used." The personal factor emerged most dra-
From this long list of questions, only two matically in our interviews when, having
factors emerged as consistently important asked respondents to comment on the im-
in explaining utilization: (a) political con- portance of each of our 11 utilization fac-
siderations, to be discussed in Chapter 14, tors, we asked them to identify the single
and (b) a factor we called the personal factor that was most important in explain-
factor. This latter factor was unexpected, ing the impact or lack of impact of that
and its clear importance to our respondents particular study. Time after time, the factor
had, we believed, substantial implications they identified was not on our list. Rather,
for the use of program evaluation. None of they responded in terms of the importance
the other specific literature factors about of individual people:
which we asked questions emerged as im-
portant with any consistency. Moreover, Item: I would rank as the most important fac-
when these specific factors were important tor this division director's interest, [his] in-
in explaining the use or nonuse of a par- terest in evaluation. Not all managers are that
ticular study, it was virtually always in the motivated toward evaluation. [DM353:17].2
context of a larger set of circumstances and
conditions related to either political con- Item: [The single most important factor that
siderations or the personal factor. had the greatest effect on how the study got
The personal factor is the presence of used was] the principal investigator. . . . If I
an identifiable individual or group of peo- have to pick a single factor, I'll pick people
ple who personally care about the evalu- anytime. [DM328:20]
ation and the findings it generates. Where
such a person or group was present, evalu- Item: That it came from the Office of the
ations were used; where the personal factor Director—that's the most important factor.
was absent, there was a correspondingly . . . The proposal came from the Office of the
marked absence of evaluation impact. Director. It had his attention and he was
The personal factor represents the lead- interested in it, and he implemented many of
ership, interest, enthusiasm, determina- the things. [DM312:21]
tion, commitment, assertiveness, and car-
ing of specific, individual people. These are Item: [The single most important factor was
people who actively seek information to that] the people at the same level of decision
make judgments and reduce decision un- making in [the new office] were not inter-
certainties. They want to increase their ested in making decisions of the kind that the
ability to predict the outcomes of program- people [in the old office] were, I think that
Fostering Intended Use by Intended Users • 45

probably had the greatest impact. The fact als. W h e n asked to identify the one factor
that there was no one at [the new office] after that is most i m p o r t a n t in w h e t h e r a study
the transfer who was making programmatic gets used, he summarized his viewpoint as
decisions. [EV361:27] follows:

Item: Well, I think the answer there is in the The most important factor is desire on the
qualities of the people for whom it was part of the managers, both the central federal
made. That's sort of a trite answer, but it's managers and the site managers. I don't think
true. That's the single most important factor there's [any doubt], you know, that evalu-
in any study now that's utilized. [EV232:22] ation should be responsive to their needs,
and if they have a real desire to get on with
Item: Probably the single factor that had the whatever it is they're supposed to do, they'll
greatest effect on how it was used was the apply it. And if the evaluations don't meet
insistence of the person responsible for iitiat- their needs, they won't. About as simple as
ing the study that the Director of you can get. I think the whole process is far
become familiar with its findings and arrive more dependent on the skills of the people
at judgment on it. [DM369:25] who use it than it is on the sort of peripheral
issues of politics, resources. . . . Institutions
Item: [The most important factor was] the are tough as hell to change. You can't change
real involvement of the top decision makers an institution by coming and doing an evalu-
in the conceptualization and design of the ation with a halo. Institutions are changed by
people, in time, with a constant plugging
study, and their commitment to the study.
away at the purpose you want to accomplish.
And if you don't watch out, it slides back.
While these c o m m e n t s concern the im-
portance of interested and committed in-
dividuals in studies that were actually His view had emerged early in the inter-
used, studies that were not used stand out view when he described h o w evaluations
in that there was often a clear absence of were used in the U.S. Office of Economic
the personal factor. O n e evaluator, w h o Opportunity (OEO):
was not sure h o w his study was used, but
suspected it had not been, remarked, In OEO, it depended on who the program
officer was, on the program review officials,
I think that since the client wasn't terribly on program monitors for each of these grant
interested . . . and the whole issue had programs. . . . Where there were aggressive
shifted to other topics, and since we weren't program people, they used these evaluations
interested in doing it from a research point whether they understood them or not. They
of view . . . nobody was interested. used them to effect improvements, direct
[EV264:14] allocations of funds within the program, ex-
plain why the records were kept this way,
Another highly experienced evaluator why the reports weren't complete or what-
was particularly a d a m a n t and articulate ever. Where the program officials were
on the theory that the major factor affect- unaggressive, passive—nothing1.
ing use is the personal energy, interests, Same thing's true at the project level.
abilities, and contacts of specific individu- Where you had a director who was aggres-

sive and understood what the hell the struc- that [evaluation] was built in, but the fact
ture was internally, he used evaluation as that we built it in on purpose. That is, the
leverage to change what went on within his agency head and myself had broad responsi-
program. Those who weren't—nothingl bilities for this, wanted the evaluation study
[EV346:5] results, and we expected to use them. There-
fore, they were used. That's my point. If
At another point he observed, "The basic someone else had built it in because they
thing is h o w the administrators of the pro- thought it was needed, and we didn't care,
gram view themselves and their responsi- I'm sure the use of the study results would
bilities. That's the controlling factor" have been different. [DM367:12]
The same theme emerged in his com- T h e evaluator (an external agent se-
ments about each possible factor. Asked lected t h r o u g h an o p e n request-for-
about the effects on use of methodological proposal process) independently c o r r o b o -
quality, positive or negative findings, and rated the decision maker's explanation:
the degree to which the findings were ex-
pected, he always returned eventually to The principal reason [for use] was that the
the importance of managerial interest, decision maker was the guy who requested
competence, and confidence. T h e person the evaluation and used its results. That is,
makes the difference. the organizational distance between the poli-
O u r sample included another rather cymaker and the evaluator was almost zero
adamant articulation of this premise. An in this instance. That's the most important
evaluation of a pilot program involving reason it had an impact. . . . It was the fact
four major projects was undertaken at the that the guy who was asking the question was
instigation of the program administrator. the guy who was going to make use of the
H e made a special effort to make sure that answer. [EV367:12].
his question (i.e., Were the pilot projects
capable of being extended and general- H e r e , then, is a case in which a decision
ized?) was answered. H e guaranteed this by maker commissioned an evaluation k n o w -
personally taking an active interest in all ing w h a t information he n e e d e d ; the
parts of the study. The administrator had evaluator was committed to answering the
been favorable to the program in principle, decision maker's questions; and the deci-
was uncertain what the results would be, sion maker was committed to using the
but was hoping that the program would findings. T h e result was a high level of use
prove effective. The evaluation findings in making a decision contrary to the direc-
were, in fact, negative. The program was tor's initial personal hopes. In the w o r d s
subsequently ended, with the evaluation of the evaluator, the major factor explain-
carrying "considerable weight" in that de- ing use was that "the guy w h o was going
cision [ D M 3 6 7 : 8 ] . Why was this study
to be making the decision was aware of
used in such a dramatic way? His answer
and interested in the findings of the study
was emphatic:
and had some hand in framing the ques-
tions to be answered; that's very impor-
Look, we designed the project with an evalu- t a n t " [EV367:20].
ation component in it, so we were committed The program director's overall conclu-
to use it and we did. . . . It's not just the fact sion gets to the heart of the personal factor:

Fostering Intended Use by Intended Users • 47

Factors that made a positive contribution to individuals take direct, personal responsi-
use? One would be that the decision makers bility for getting findings to the right peo-
themselves want the evaluation study results. ple, evaluations have an impact. Where the
I've said that several times. If that's not pre- personal factor is absent, there is a marked
sent, it's not surprising that the results aren't absence of impact. Use is not simply deter-
used. [DM367:17] mined by some configuration of abstract
factors; it is determined in large part by
This point was made often in the inter- real, live, caring human beings.
views. O n e highly placed and widely ex-
perienced administrator offered the fol-
lowing advice at the end of a four-hour
Supporting Research
on the Personal Factor
Win over the program people. Make sure
James Burry (1984) of the UCLA Center
you're hooked into the people who're going
for the Study of Evaluation conducted a
to make the decision in six months from the
thorough review of the voluminous litera-
time you're doing the study, and make sure
ture on evaluation utilization. T h a t review
that they feel it's their study, that these are
was the basis for a synthesis of factors that
their ideas, and that it's focused on their
affect evaluation use (Alkin et al. 1985).
values. [DM283:40]
The synthesis grew out of empirical re-
search on evaluation utilization (Alkin,
Presence of the personal factor in-
Daillak, and White 1979) and organizes the
creases the likelihood of long-term follow-
various factors in three major categories:
through, that is, persistence in getting
human factors, context factors, and evalu-
evaluation findings used. O n e study in
ation factors.
particular stood out in this regard. It was
initiated by a new office director with n o
support internally and considerable o p p o - Human factors reflect evaluator and user
sition from other affected agencies. T h e characteristics with a strong influence on
director found an interested and commit- use. Included here are such factors as peo-
ted evaluator. T h e t w o w o r k e d closely ple's attitudes toward and interest in the
together. T h e findings were initially ig- program and its evaluation, their back-
nored because it wasn't a hot political grounds and organizational positions, and
issue at the time, but over the ensuing four their professional experience levels.
years, the director and evaluator person- Context factors consist of the require-
ally worked to get the attention of key ments and fiscal constraints facing the evalu-
members of Congress. T h e evaluation ation, and relationships between the pro-
eventually contributed to passing signifi- gram being evaluated and other segments of
cant legislation in a new area of federal its broader organization and the surrounding
programming. F r o m beginning to end, the community.
story was one of personal h u m a n commit- Evaluation factors refer to the actual con-
ment to getting evaluation results used. duct of the evaluation, the procedures used
Although the specifics vary from case to in the conduct of the evaluation, and the
case, the pattern is markedly clear: Where quality of the information it provides. (Burry
the personal factor emerges, where some 1984:1)

The primary weakness of this frame- Evaluation devoted to "Stakeholder-

work is that the factors are undifferenti- Based Evaluation" (Bryk 1983), "The Cli-
ated in terms of importance. Burry ended ent Perspective in Evaluation" (Nowak-
up with a checklist of factors that may owski 1987), and "Evaluation Utilization"
influence evaluation, but no overall hier- (McLaughlin et al. 1988). Marvin Alkin
archy was presented in his synthesis; that (1985), founder and former director of
is, a hierarchy that places more impor- the Center for the Study of Evaluation at
tance on certain factors as necessary the University of California, Los Angeles,
and/or sufficient conditions for evaluation made the personal factor the basis for his
use. At a 1985 conference on evaluation Guide for Evaluation Decision-Makers.
use sponsored by the UCLA Center for the Jean King concluded from her research
Study of Evaluation, I asked Jim Burry if review (1988) and case studies (1995) that
his extensive review of the literature sug- involving the right people is critical to
gested any factors as particularly impor- evaluation use. In a major analysis of "the
tant in explaining use. He answered with- Feasibility and Likely Usefulness of Evalu-
out hesitation: ation," Joseph Wholey (1994) has shown
that involving intended users early is criti-
There's no question about it. The personal cal so that "the intended users of the
factor is far and away the most important. evaluation results have agreed on how
You're absolutely right in saying that the they will use the information" (p. 16) be-
personal factor is the most important ex- fore the evaluation is conducted. Cousins,
planatory variable in evaluation use. The Donohue, and Bloom (1995) reviewed a
research of the last five years confirms the great volume of research on evaluation
primacy of the personal factor, (personal and found that "a growing body of data
conversation 1985) provide support" for the proposition that
"increased participation in research by
Lester and Wilds (1990) conducted a stakeholders will heighten the probabil-
comprehensive review of the literature on ity that research data will have the in-
use of public policy analysis. Based on that tended impact" (p. 5). Johnson (1995)
review, they developed a conceptual used conjoint measurement and analysis
framework to predict use. Among the hy- to estimate evaluation use and found that
potheses they found supported were evaluators attribute increased use to in-
these: creased participation in the evaluation
process by practitioners. And Carol Weiss
• The greater the interest in the subject by the (1990), one of the leading scholars of
decision maker, the greater the likelihood of knowledge use, concluded in her key-
utilization. note address to the American Evaluation
• The greater the decision maker's participa- Association:
tion in the subject and scope of the policy
analysis, the greater the likelihood of utiliza-
tion, (p. 317) First of all, it seems that there are certain
participants in policy making who tend to be
These hypotheses were further con- 'users' of evaluation. The personal factor—a
firmed in the evaluation literature in spe- person's interest, commitment, enthusi-
cial issues of New Directions for Program asm—plays a part in determining how much
Fostering Intended Use by Intended Users • 49

influence a piece of research will have, The GAO (1995) report recommended
(p. 177) that Senate Committee members have "in-
creased communication with agency pro-
The need for interactive dialogue at a gram and evaluation staff to help ensure
personal level applies to large-scale na- that information needs are understood
tional evaluations as well as smaller-scale, and that requests and reports are suitably
local evaluations (Dickey 1981). Wargo framed and are adapted as needs evolve"
(1995) analyzed three unusually success- (p. 41). This recommendation affirms the
ful federal evaluations in a search for importance of personal interactions -as a
"characteristics of successful program basis for mutual understanding to increase
evaluations"; he found that active in- the relevance and, thereby, the utility of
volvement of key stakeholders was critical evaluation reports.
at every stage: during planning, while Another framework that supports the
conducting the evaluation, and in dissemi- importance of the personal factor is the
nation of findings (p. 77). In 1995, the "Decision-Oriented Educational Research"
U.S. General Accounting Office (GAO) approach of Cooley and Bickel (1985). Al-
studied the flow of evaluative information • though the label for this approach implies
to Congress by following up three major a focus on decisions rather than people, in
federal programs: the Comprehensive Child fact the approach is built on a strong "client
Development Program, the Community orientation." This client orientation means
Health Centers program, and the Chapter that the primary intended users of decision-
1 Elementary and Secondary Education oriented educational research are clearly
Act, aimed at providing compensatory identified and then involved in all stages of
education services to low-income stu- the work through ongoing dialogue be-
dents. Analysts concluded that underutili- tween the researcher and the client. Cooley
zation of evaluative information was a and Bickel presented case evidence to
direct function of poor communications document the importance of being client-
between intended users (members of the oriented.
Senate Committee on Labor and Human In a major review of evaluation use in
Resources) and responsible staff in the nonprofit organizations, the Independent
three programs: Sector concluded that attending to "the
human side of evaluation" makes all the
difference. "Independent Sector learned
Finally, we observed that communication be- that evaluation means task, process, and
tween the Committee and agency staff people. It is the people side—the human
knowledgeable about program information resources of the organization—who make
was limited and comprised a series of one- the "formal" task and process work and
way communications (from the Committee will make the results work as well" (Moe
to the agency or the reverse) rather than joint 1993:19).
discussion. This pattern of communication, The evaluation literature contains sub-
which was reinforced by departmental ar- stantial additional evidence that working
rangements for congressional liaison, affords with intended users can increase use (e.g.,
little opportunity to build a shared under- Bedell et al. 1985; Dawson and D'Amico
standing about the Committee's needs and 1985; King 1985; Lawler et al. 1985;
how to meet them. (GAO 1995:40) Siegel and Tuckel 1985; Cole 1984; Evans

and Blunden 1984; Hevey 1984; Rafter tors' beliefs and practices conducted by
1984; Bryk 1983; Campbell 1983; Glaser, Cousins et al. (1995).
Abelson, and Garrison 1983; Lewy and Cousins and his colleagues (1995) sur-
Alkin 1983; Stalford 1983; Barkdoll 1982; veyed a sample of 564 evaluators and 68
Beyer and Trice 1982; Canadian Evalu- practitioners drawn from the membership
ation Society 1982; King and Pechman lists of professional evaluation associations
1982; Saxe and Koretz 1982; Dickey and in the United States and Canada. The sur-
Hampton 1981; Leviton and Hughes vey included a list of possible beliefs that
1981; Alkin and Law 1980; Braskamp and respondents could agree or disagree with.
Brown 1980; Studer 1978). Greatest consensus centered on the state-
Support for the importance of the per- ment "Evaluators should formulate rec-
sonal factor also emerged from the work of ommendations from the study." (I'll dis-
the Stanford Evaluation Consortium, one cuss recommendations in a later chapter.)
of the leading places of ferment and reform The item eliciting the next highest agree-
in evaluation during the late 1970s and ment was "The evaluator's primary func-
early 1980s. Cronbach and associates in the tion is to maximize intended uses by in-
Consortium identified major reforms tended users of evaluation data" (p. 19).
needed in evaluation by publishing a pro- Given widespread agreement about the de-
vocative set of 95 theses, following the sired outcome of evaluation, namely, in-
precedent of Martin Luther. Among their tended uses by intended users, let's now
theses was this observation on the personal examine some of the practical implications
factor: "Nothing makes a larger difference of this perspective.
in the use of evaluations than the personal
factor—the interest of officials in learning
from the evaluation and the desire of the Practical Implications
evaluator to get attention for what he of the Personal Factor
knows" (Cronbach et al. 1980:6; emphasis
added). First, in order to work with primary
intended users to achieve intended uses, the
evaluation process must surface people
Evaluation's Premier Lesson who want to know something. This means
locating people who are able and willing to
The importance of the personal factor in use information. The number may vary
explaining and predicting evaluation use from one prime user to a fairly large group
leads directly to the emphasis in utilization- representing several constituencies, for ex-
focused evaluation on working with in- ample, a task force of program staff, clients,
tended users to specify intended uses. The funders, administrators, board members,
personal factor directs us to attend to spe- community representatives, and officials or
cific people who understand, value, and policymakers (see Exhibit 3.1). Cousins
care about evaluation and further directs us et al. (1995) surveyed evaluators and found
to attend to their interests. This is the pri- that they reported six stakeholders as the
mary lesson the profession has learned median number typically involved in a pro-
about enhancing use, and it is wisdom now ject. While stakeholders' points of view
widely acknowledged by practicing evalua- may vary on any number of issues, what
tors, as evidenced by research on evalua- they should share is a genuine interest in
Fostering Intended Use by Intended Users • 51

A Statewide Evaluation Task Force

The Personal Factor means getting key influential together, face-to-face, to negotiate the design.
Here's an example.
In 1993, the Minnesota Department of Transportation created eight Area Transportation
Partnerships to make decisions about roads and other transportation investments in a cooperative
fashion between state and local interests. To design and oversee the study of how the partnerships
were working, a technical panel was created to represent the diverse interests involved. Members of
the technical panel included:

The District Engineer from District 1 (Northeast)

The Planning Director from District 6 (Southeast) ,
The District Planner from District 7 (South central)
Planner for a Regional Development Council (Northwest)
Department of Transportation Director of Economic Analysis and Special Studies, State Office of
Investment Management
An influential county commissioner
Director of a regional transit operation
Director of a regional metropolitan Council of Governments (Western part of the state)
Member of the Metropolitan Council Transportation Advisory Committee (Greater Minneapolis/
Saint Paul)
A county engineer
A private transportation consultant
A city engineer from a small town
A metropolitan planning and research engineer
The State Department of Transportation Interagency Liaison
A University of Minnesota researcher from the University's Center for Transportation Studies
An independent evaluation consultant (not the project evaluator)
Five senior officials from various offices of the State Department of Transportation
The evaluator and two assistants

This group met quarterly throughout the study. The group made substantive improvements in the
original design, gave the study credibility with different stakeholder groups, participated in interpreting
findings, and laid the groundwork for use.

using evaluation, an interest manifest in a ation is to answer seriously and searchingly

willingness to take the time and effort to the question posed by Marvin Alkin
work through their information needs and (1975a): "Evaluation: Who Needs It? Who
interests. Thus, the first challenge in evalu- Cares?" Answering this question, as we

shall see, is not always easy, but it is always chapter discussed ways of cultivating inter-
critical. est in evaluation and building commitment
Second, formal position and authority to use. Even people initially inclined to
are only partial guides in identifying pri- value evaluation will still often need train-
mary users. Evaluators must find strategi- ing and support to become effective infor-
cally located people who are enthusiastic, mation users.
committed, competent, interested, and as- Fifth, evaluators need skills in building
sertive. Our data suggest that more may be relationships, facilitating groups, managing
accomplished by working with a lower- conflict, walking political tightropes, and
level person displaying these characteristics effective interpersonal communications to
than by working with a passive, uninter- capitalize on the importance of the per-
ested person in a higher position. sonal factor. Technical skills and social sci-
Third, quantity, quality, and timing of ence knowledge aren't sufficient to get
interactions with intended users are all im- evaluations used. People skills are critical.
portant. A large amount of interaction be- Ideals of rational decision making in mod-
tween evaluators and users'with little sub- ern organizations notwithstanding, per-
stance may backfire and actually reduce sonal and political dynamics affect what
stakeholder interest. Evaluators must be really happens. Evaluators without the
strategic and sensitive in asking for time savvy and skills to deal with people and
and involvement from busy people, and politics will find their work largely ignored,
they must be sure they're interacting with or, worse yet, used inappropriately.
the right people around relevant issues. Sixth, a particular evaluation may have
Increased contact by itself is likely to ac- multiple levels of stakeholders and there-
complish little. Nor will interaction with fore need multiple levels of stakeholder
the wrong people (i.e., those who are not involvement. For example, funders, chief
oriented toward use) help much. It is the executives, and senior officials may consti-
nature and quality of interactions between tute the primary users for overall effective-
evaluators and decision makers that is at ness results, while lower-level staff and
issue. My own experience suggests that participant stakeholder groups may be in-
where the right people are involved, the volved in using implementation and moni-
amount of direct contact can sometimes be toring data for program improvement. Ex-
reduced because the interactions that do hibit 3.2 provides an example of such a
occur are of such high quality. Later, when multiple level structure for different levels
we review the decisions that must be made of stakeholder involvement and evaluation
in the evaluation process, we'll return to use.
the issues of quantity, quality, and timing of Menu 3.1 summarizes these practical
interactions with intended users. implications of the personal factor for use.
Fourth, evaluators will typically have to
work to build and sustain interest in evalu-
ation use. Identifying intended users is part Diversions Away
selection and part nurturance. Potential us- From Intended Users
ers with low opinions of or little interest in
evaluation may have had bad prior experi- To appreciate some of the subtleties of
ences or just not have given much thought the admonition to focus on intended use by
to the benefits of evaluation. The second intended users, let's consider a few of the
Fostering Intended Use by Intended Users • 53

A Multilevel Stakeholder Structure and Process

The Saint Paul Foundation formed a Donor Review Board of several philanthropic foundations in
Minnesota to fund a project, Supporting Diversity in Schools (SDS). The project established local
school-community partnerships with communities of color: African Americans, Hispanics, Native
Americans, and Southeast Asians. The evaluation had several layers based on different levels of
stakeholder involvement and responsibility.

Stakeholder Group Evaluation Focus Nature of Involvement

Donor Review Board (Executives Overall effectiveness; policy Twice-a-year meetings to review
and Program Officers from implications; sustainability the design and interim evaluation
contributing Foundations and results
School Superintendent)
Final report directed to this group

District Level Evaluation Group Implementation monitoring An initial full-day retreat with
(Representatives from in early years; district-level 40 people from diverse groups;
participating schools, social outcomes in later years annual retreat sessions to update,
service agencies, community refocus, and interpret interim
organizations, and project staff) findings

Partnership Level Evaluation Documenting activities and Annual evaluation plan; complet-
Teams (Teachers, community outcomes at the local partner- ing evaluation documents for
representatives, and evaluation ship level: one school, one every activity; quarterly review
staff liaisons) community of color of progress to use findings for

temptations evaluators face that lure them own questions according to their own in-
away from the practice of utilization- terests, needs, and priorities. Others may
focused evaluation. have occasional input here and there, but
First, and most common, evaluators are what emerges is an evaluation by the
tempted to make themselves the major de- evaluators, for the evaluators, and of the
cision makers for the evaluation. This can evaluators. Such studies are seldom of use
happen by default (no one else is willing to to other stakeholders, whose reactions are
do it), by intimidation (clearly, the evalu- likely to be, "Great study. Really well done.
ator is the expert), or simply by failing to Shows lots of work, but doesn't tell us
think about or seek primary users (why anything we want to know."
make life difficult?). The tip-off that A less innocent version of this scenario
evaluators have become the primary in- occurs when academics pursue their basic
tended users (either by intention or default) research agendas under the guise of evalu-
is that the evaluators are answering their ation research. The tip-off here is that the

I MUNI 13.1

Implications of the Personal Factor for Planning Use

• Find and cultivate people who want to learn.

• Formal position and authority are only partial guides in identifying primary
users. Find strategically located people who are enthusiastic, committed,
competent, and interested.
• Quantity, quality, and timing of interactions with intended users are all
• Evaluators will typically have to work to build and sustain interest in
evaluation use. Building effective relationships with intended users is part
selection, part nurturance, and part training.
• Evaluators need people skills in how to build relationships, facilitate groups,
manage conflict, walk political tightropes, and communicate effectively.
• A particular evaluation may have multiple levels of stakeholders and there-
fore need multiple levels of stakeholder involvement. (See Exhibit 3.2.)
v . L: )

evaluators insist on designing the study in Lincoln (1981). Responsive evaluation

such a way as to test some theory they think "takes as its organizer the concerns and
is particularly important, whether or not issues of stakeholding audiences" (Guba
people involved in the program see any and Lincoln 1981:23; emphasis in the
relevance to such a test. original). The evaluator interviews and ob-
A second temptation that diverts evalua- serves stakeholders, then designs an evalu-
tors from focusing on specific intended ation that is responsive to stakeholders'
users is to fall prey to the seemingly stake- issues. The stakeholders, however, are no
holder-oriented "identification of audi- more than sources of data and an audience
ence" approach. Audiences turn out to be for the evaluation, not real partners in the
relatively passive groups of largely anony- evaluation process.
mous faces: the "feds," state officials, the The 1994 revision of the Joint Commit-
legislature, funders, clients, the program tee Standards for Evaluation moved to lan-
staff, the public, and so forth. If specific guage about "intended users" and "stake-
individuals are not identified" from these holders" in place of earlier references to
audiences and organized in a manner that "audiences." Thus, in the new version, "the
permits meaningful involvement in the Utility Standards are intended to ensure
evaluation process, then, by default, the that an evaluation will serve the informa-
evaluator becomes the real decision maker tion needs of intended users" as opposed
and stakeholder ownership suffers, with a to "given audiences" in the original 1981
corresponding threat to utility. This is my version (Joint Committee 1994,1981; em-
critique of "responsive evaluation" as ad- phasis added). The first standard was
vocated by Stake (1975) and Guba and changed to "Stakeholder Identification"
Fostering Intended Use by Intended Users • 55

rather than the original "Audience ldenLi tial from the outset. To target evaluations
tification." Such changes in language are at organizations is to target them at nobody
far from trivial. They indicate how the in particular—and, in effect, not to really
knowledge base of the profession has target them at all.
evolved. The language we use shapes how A fourth diversion away from intended
we think. The nuances and connotations users is to focus on decisions instead of on
reflected in these language changes are fun- decision makers. This approach is epito-
damental to the philosophy of utilization- mized by Mark Thompson (1975), who
focused evaluation. defined evaluation as "marshalling of infor-
A third diversion from intended users mation for the purposes of improving de-
occurs when evaluators target organiza- cisions" (p. 26) and made the first step in
tions rather than specific individuals. This an evaluation "identification of the deci-
appears to be more specific than targeting sion or decisions for which information is
general audiences, but really isn't. Organi- required" (p. 38). The question of who will
zations as targets can be strangely devoid of make the decision remains implicit. The
real people. Instead, the focus shifts to decision-oriented approach stems from a
positions and the roles and authority that rational social scientific model of how de-
attach to positions. Since Max Weber's cision making occurs:
(1947) seminal essay on bureaucracy gave
birth to the study of organizations, sociolo- 1. A clear-cut decision is expected to be made.
gists have viewed the interchangeability of 2. Information will inform the decision.
people in organizations as the hallmark of 3. A study supplies the needed information.
institutional rationality in modern society. 4. The decision is then made in accordance
Under ideal norms of bureaucratic ration- with the study's findings.
ality, it doesn't matter who's in a position,
only that the position be filled using uni- The focus in this sequence is on data
versalistic criteria. Weber argued that bu- and decisions rather than people. But peo-
reaucracy makes for maximum efficiency ple make decisions and, it turns out, most
precisely because the organization of role- "decisions" accrete gradually and incre-
specific positions in an unambiguous hier- mentally over time rather than getting
archy of authority and status renders action made at some concrete, decisive moment
calculable and rational without regard to (Weiss 1990, 1977; Allison 1971; Lind-
personal considerations or particularistic blom 1965, 1959). It can be helpful, even
criteria. Such a view ignores the personal crucial, to orient evaluations toward fu-
factor. Yet, it is just such a view of the world ture decisions, but identification of such
that has permeated the minds of evaluators decisions, and the implications of those
when they say that their evaluation is for decisions for the evaluation, are best made
the federal government, the state, the in conjunction with intended users who
agency, or any other organizational entity. come together to decide what data will be
But organizations do not consume informa- needed for what purposes, including, but
tion; people do—individual, idiosyncratic, not limited to, decisions.
caring, uncertain, searching people. Who is Utilization-focused evaluation is often
in a position makes all the difference in the confused with or associated with decision-
world to evaluation use. To ignore the per- oriented approaches to evaluation, in part,
sonal factor is to diminish utilization poten- I presume, because both approaches are

concrete and focused and both are consid- ness and efficiency; (2) the behavioral ob-
ered "utilitarian." Ernest House (1980) jectives approach, which measures attain-
wrote an important book categorizing vari- ment of clear, specific goals; (3) goal-free
ous approaches to evaluation in which he evaluation, which examines the extent to
included utilization-focused evaluation which actual client needs are being met
among the "decision-making models" he by the program; (4) the art criticism ap-
reviewed. The primary characteristic of a proach, which makes the evaluator's own
decision-making model is that "the evalu- expertise-derived standards of excellence a
ation be structured by the actual decisions criterion against which programs are
to be made" (p. 28). I believe he incorrectly judged; (5) the accreditation model, where
categorized utilization-focused evaluation a team of external accreditors determines
because he failed to appreciate the distinct the extent to which a program meets pro-
and critical nature of the personal factor. fessional standards for a given type of pro-
While utilization-focused evaluation in- gram; (6) the adversary approach, in which
cludes the option of focusing on decisions, two teams do battle over the summative
it can also serve a variety of other purposes, question of whether a program should be
depending on the information needs of continued; and (7) the transaction model,
primary intended users. That is, possible which concentrates on program processes.
intended uses include a large menu of op- What is omitted from the House classi-
tions, which we'll examine in Chapters 4 fication scheme is an approach to evalu-
and 5. For example, the evaluation process ation that focuses on and is driven by the
can be important in directing and focusing information needs of specific people who
how people think about the basic policies will use the evaluation processes and find-
involved in a program, what has come to ings. The point is that the evaluation is
be called conceptual use; evaluations can user-focused. Utilization-focused evalua-
help in fine-tuning program implementa- tion, then, in my judgment, falls within a
tion; the process of designing an evaluation category of evaluations that I would call,
may lead to clearer, more specific, and following Marvin Alkin (1995), user-ori- -
more meaningful program goals; and ented. This is a distinct alternative to the
evaluations can provide information on cli- other models identified by House. In the
ent needs and assets that will help inform other models, the content of the evaluation
general public discussions about public is determined by the evaluator's presuppo-
policy. These and other outcomes of evalu- sitions about what constitutes an evalu-
ation are entirely compatible with utiliza- ation: a look at the relationship between
tion-focused evaluation but do not make a inputs and outcomes; the measurement of
formal decision the driving force behind goal attainment; advice about a specific
the evaluation. programmatic decision; description of pro-
Nor does utilization-focused evaluation gram processes; a decision about future or
really fit within any of House's other seven continued funding; or judgment according
categories, though any of them could be an to some set of expert or professional stan-
option in a utilization-focused evaluation if dards. In contrast to these models, user-
that's the way intended users decided to focused evaluation describes an evaluation
orient the evaluation: (1) systems analysis, process for making decisions about the con-
which quantitatively measures program in- tent of an evaluation—but the content itself
puts and outcomes to look at effective- is not specified or implied in advance.
Fostering Intended Use by Intended Users • 57

Thus, any of the eight House models, or in any evaluation, but relying on the hope
adaptations and combinations of those that something useful will turn up is a risky
models, might emerge as the guiding direc- strategy. Eleanor Chelimsky (1983) has ar-
tion in user-focused evaluation, depending gued that the most important kind of ac-
on the information needs of the people for countability in evaluation is use that comes
whom the evaluation information was be- from "designed tracking and follow-up of
ing collected. Let's continue, now, examin- a predetermined use to predetermined
ing three other temptations that divert user" (p. 160). She calls this a "closed-
evaluators from being user focused. looped feedback process" in which "the
A fifth temptation is to assume that the policymaker wants information, asks for it,
flinders of the evaluation are the primary and is interested in and informed by the
intended users, that is, those who pay the response" (p. 160). This perspective solves
fiddler call the tune. In some cases, this is the problem of defining use, addresses the
accurate. Funders are hopefully among question of whom the evaluation is for, and
those most interested in using evaluation. builds in evaluation accountability since the
But there may be additional important us- predetermined use becomes the criterion
ers. Moreover, evaluations are funded for against which the success of the evaluation
reasons other than their perceived utility, can be judged. SQch a process has to be
for example, wanting to give the appear- planned.
ance of supporting evaluation; because leg- A seventh and final temptation (seven
islation or licensing requires evaluation; or use-deadly sins seem sufficient, though cer-
because someone thought it had to be writ- tainly not exhaustive of the possibilities) is
ten into the budget. Those who control to convince oneself that it is unseemly to
evaluation purse strings may not have any enter the fray and thereby run the risks that
specific evaluation questions. Often, they come with being engaged. I've heard aca-
simply believe that evaluation is a good demic evaluators insist that their responsi-
thing that keeps people on their toes. They bility is to ensure data quality and design
may not care about the content of a spe- rigor in the belief that the scientific validity
cific evaluation; they may care only that of the findings will carry the day. The evi-
evaluation—any evaluation—takes place. dence suggests this seldom happens. An
They mandate the process, but not the academic stance that justifies the evaluator
substance. Under such conditions (which standing above the messy fray of people
are not unusual), there is considerable op- and politics is more likely to yield scholarly
portunity for identifying and working with publications than improvements in pro-
additional interested stakeholders to for- grams. Fostering use requires becoming en-
mulate relevant evaluation questions and a gaged in building relationships and sorting
correspondingly appropriate design. through the politics that enmesh any pro-
A sixth temptation is to put off attending gram. In so doing, the evaluator runs the
to and planning for use from the beginning. risks of getting entangled in changing
It's tempting to wait until findings are in to power dynamics, having the rug pulled out
worry about use, essentially not planning by the departure of a key intended user,
for use by waiting to see what happens. In having relationships go bad, and/or being
contrast, planned use occurs when the in- accused of bias. Later we'll discuss strate-
tended use by intended users is identified gies for dealing with these and other risks,
at the beginning. Unplanned use can occur but the only way I know to avoid them

MENU 3.2

Temptations Away From Being User-Focused: Seven Use-Deadly Sins

1. Evaluators make themselves the primary decision makers and, therefore, the
primary users.
2. Identifying vague, passive audiences as users instead of real people.
3. Targeting organizations as users (e.g., "the feds") instead of specific persons.
4. Focusing on decisions instead of decision makers.
5. Assuming the evaluation's funder is automatically the primary stakeholder.
6. Waiting until the findings are in to identify intended users and intended uses.
7. Taking a stance of standing above the fray of people and politics.

altogether is to stand aloof; that may pro- understands the problem better, understands
vide safety, but at the high cost of utility and the choices better, or understands the impli-
relevance. cations of choice better. The decision maker
Menu 3.2 summarizes these seven use- can say that this analysis helped me. (Lynn
deadly temptations that divert evaluators 1980a:85)
from clearly specifying and working with
intended users. Notice here that the emphasis is on
informing the decision maker, not the de-
cision. Lynn argues in his casebook on
User-Focused Evaluation policy analysis (Lynn 1980b) that a major
in Practice craft skill needed by policy and evaluation
analysts is the ability to understand and
Lawrence Lynn Jr., Professor of Public make accommodations for a specific deci-
Policy at the Kennedy School of Govern- sion maker's cognitive style and other per-
sonal characteristics. His examples are ex-
ment, Harvard University, has provided ex-
emplars of the user-focused approach.
cellent evidence for the importance of a
user-focused way of thinking in policy
analysis and evaluation. Lynn was inter- Let me take the example of Elliot Rich-
viewed by Michael Kirst for Educational ardson, for whom I worked, or Robert
Evaluation and Policy Analysis. He was MacNamara, for that matter. These two in-
asked, "What would be a test of a 'good dividuals were perfectly capable of under-
policy analysis'?" standing the most complex issues and ab-
sorbing details—absorbing the complexity,
One of the conditions of a good policy analy- fully considering it in their own minds. Their
sis is that it is helpful to a decision maker. A intellects were not limited in terms of what
decision maker looks at it and finds he or she they could handle. . . . On the other hand,

Fostering Intended Use by Intended Users • 59

you will probably find more typical the deci- had been trained in the Jesuitical style of
sion makers who do not really like to argument. T h e challenge for a policy ana-
approach problems intellectually. They may lyst or evaluator, then, becomes grasping
be visceral, they may approach issues with a the decision maker's cognitive style and
wide variety of preconceptions, they may not logic. President Ronald Reagan, for ex-
like to read, they may not like data, they may ample, liked Reader's Digest style stories
not like the appearance of rationality, they and anecdotes. F r o m Lynn's perspective,
may like to see things couched in more po- an analyst presenting to Reagan w o u l d
litical terms, or overt value terms. And an have to figure out h o w to communicate
analyst has got to take that into account. policy issues t h r o u g h stories. H e a d m o n -
There is no point in presenting some highly ished analysts and evaluators to "discover
rational, comprehensive piece of work to a those art forms by which one can present
Secretary or an Assistant Secretary of State the result of one's intellectual effort" in a
who simply cannot or will not think that way. way that can be heard, appreciated a n d
But that does not mean the analyst has no understood: 1
role; that means the analyst has to figure out
how he can usefully educate someone whose In my judgment, it is not as hard as it sounds.
method of being educated is quite different. I think it is not that difficult to discover how
We did a lengthy case on the Carter ad- a Jerry Brown or a Joe Califano or a George
ministration's handling of the welfare re- Bush or a Ted Kennedy thinks, how he reacts.
form issue, and, in particular, the role of Joe All you have got to do is talk to people who
Califano and his analysts. Califano was very deal with them continuously, or read what
different in the way he could be reached than they say and write. And you start to discover
an Elliot Richardson, or even Casper Wein- the kinds of things that preoccupy them, the
berger. Califano is a political animal and has kinds of ways they approach problems. And
a relatively short attention span—highly in- you use that information in your policy
telligent, but an action-oriented person. And analyses. I think the hang-up most analysts
one of the problems his analysts had is that or many analysts have is that they want to be
they attempted to educate him in the classi- faithful to their discipline. They want to be
cal, rational way without reference to any faithful to economics or faithful to political
political priorities, or without attempting to science and are uncomfortable straying be-
couch issues and alternatives in terms that yond what their discipline tells them they are
would appeal to a political, action-oriented competent at dealing with. The analyst is
individual. And so there was a terrible com- tempted to stay in that framework with
munications problem between the analysts which he or she feels most comfortable.
and Califano. I think a large part of that had And so they have the hang-up, they can-
nothing to do with Califano's intellect or his not get out of it. They are prone to say that
interest in the issues; it had a great deal to do my tools, my training do not prepare me to
with the fact that his cognitive style and the deal with things that are on Jerry Brown's
analyst's approach just did not match. mind, therefore, I cannot help him. That is
wrong. They can help, but they have got to
Lynn also used the example of Jerry be willing to use the information they have
Brown, former G o v e r n o r of California. about how these individuals think and then
Brown liked policy analyses framed as begin to craft their work, to take that
a debate—thesis, antithesis—because he into account. (Lynn 1980a: 86-87).

Lynn's examples d o c u m e n t the impor- through 10), methods decisions (Chapters

tance of the personal factor at the highest 11 and 12); and analysis approaches (Chap-
levels of government. Alkin et al. (1979) ter 13). We'll also look at the political and
have shown h o w the personal factor o p - ethical implications of utilization-focused
erates in evaluation use at state and local evaluation (Chapter 14).
levels. Focusing on the personal factor Throughout, we'll be guided by atten-
provides direction about w h a t to look for tion to the essence of utilization-focused
and h o w to proceed in planning for use. evaluation: focusing on intended use for
specific intended users. Focus and specific-
ity are ways of coming to grips with
B e y o n d Just Beginning the fact that n o evaluation can serve all
potential stakeholders' interests equally
In this chapter, we've discussed the per- well. Utilization-focused evaluation makes
sonal factor as a critical consideration in explicit whose interests are served. For, as
enhancing evaluation use. The importance Baltasar Gracian observed in 1647 in The
of the personal factor explains why utiliza- Art of Worldly Wisdom:
tion-focused evaluators begin by identify-
ing and organizing primary intended evalu- It is a great misfortune to be of use to no-
ation users. They then interact with these body; scarcely less to be of use to everybody.
primary users throughout the evaluation to
nurture and sustain the commitment to use.
For there is an eighth deadly-use sin: iden- Notes
tifying primary intended users at the outset
of the study, then ignoring them until the 1. At the time of the study in 1976, I was
final report is ready. Director of the Evaluation Methodology Pro-
Attending to primary intended users is gram in the Humphrey Institute of Public Af-
not just an academic exercise performed fairs, University of Minnesota. The study was
for its own sake. Involving specific people conducted through the Minnesota Center for
w h o can and will use information enables Social Research, University of Minnesota. Re-
them to establish direction for, commit- sults of the study were first published under the
ment to, and ownership of the evaluation title "In Search of Impact: An Analysis of the
every step along the way, from initiation of Utilization of Federal Health Evaluation Re-
the study through the design and data col- search" (Patton, Grimes, et al. 1977). For details
lection stages right through to the final on the study's design and methods, see Patton
report and dissemination process. If deci- 1986:30-39. The 20 cases in the study included
sion makers have shown little interest in the 4 mental health evaluations, 4 health training
study in its earlier stages, our data suggest programs, 2 national assessments of laboratory
that they are not likely to show a sudden proficiency, 2 evaluations of neighborhood
interest in using the findings at the end. health center programs, studies of 2 health ser-
They w o n ' t be sufficiently prepared for use. vices delivery systems programs, a training
The remainder of this book examines program on alcoholism, a health regulatory pro-
the implications of focusing on intended gram, a federal loan-forgiveness program, a
use by intended users. We'll look at the training workshop evaluation, and 2 evaluations
implications for h o w an evaluation is con- of specialized health facilities. The types of
ceptualized and designed (Chapters 4 evaluations ranged from a three-week program
Fostering Intended Use by Intended Users • 61

review carried out by a single internal evaluator eral administrators or researchers. In one case,
to a four-year evaluation that cost $1.5 million. the evaluation was contracted from one unit of
Six of the cases were internal evaluations and 14 the federal government to another, so the
were external. evaluators were also federal researchers. The
Because of very limited resources, it was remaining 13 evaluations were conducted by
possible to select only three key informants to private organizations or nongovernment em-
be contacted and intensively interviewed about ployees, although several persons in this group
the utilization of each of the 20 cases in the final either had formerly worked for the federal
sample. These key informants were (a) the gov- government or have since come to do so.
ernment's internal project officer (PO) for the Evaluators in our sample represented over 225
study, (b) the person identified by the project years of experience in conducting evaluative
officer as being either the decision maker for the research.
program evaluated or the person most knowl- 2. Citations for quotes taken from the inter-
edgeable about the study's impact, and (c) the view transcripts use the following format:
evaluator who had major responsibility for the [DM367:13] refers to the transcript of an inter-
study. Most of the federal decision makers inter- view with a decision maker about evaluation
viewed had been or now are office directors study number 367; this quote was taken from
(and deputy directors), division heads, or bu- page 13 of the transcript. The study numbers
reau chiefs. Overall, these decision makers rep- and page numbers have been systematically al-
resented over 250 years of experience in the tered to protect the confidentiality of the inter-
federal government. viewees. EV201:10 and P0201:6 refer to
The evaluators in our sample were a rather interviews about the same study, the former
heterogeneous group. Six of the 20 cases were being an interview with the evaluator, the latter
internal evaluations, so the evaluators were fed- an interview with the project officer.
Intended Uses of Findings

/ fyou don't know where you're going, you'll end up somewhere else.

—Yogi Berra

• When Alice encounters the Cheshire Cat in Wonderland, she asks, "Would you tell
me, please, which way I ought to walk from here?"
• "That depends a good deal on where you want to get to," said the Cat.
"I don't much care where—" said Alice.
"Then it doesn't matter which way you walk," said the Cat.
"—so long as I get somewhere," Alice added as an explanation.
"Oh, you're sure to do that," said the Cat, "if you only walk long enough."

—Lewis .Garroii

This story carries a classic evaluation intended users. This chapter will offer a
message: To evaluate how well you're do- menu of intended uses.
ing, you must have some place you're try-
ing to get to. For programs, this has meant
having goals and evaluating goal attain- Identifying Intended Uses
ment. For evaluators, this means clarifying From the Beginning
the intended uses of a particular evaluation.
In utilization-focused evaluation, the The last chapter described a follow-up
primary criterion by which an evaluation is study of 20 federal health evaluations that
judged is intended use by intended users. assessed use and identified factors related
The previous chapter discussed identifying to varying degrees of use. A major finding


from that study was that none of our inter- Intended uses vary from evaluation to
viewees had carefully considered intended evaluation. There can be no generic or
use prior to getting the evaluation's find- absolute definition of evaluation use be-
ings. We found that decision makers, pro- cause "use" depends in part on the values
gram officers, and evaluators typically de- and goals of primary users. As Eleanor
voted little or no attention to intended uses Chelimsky (1983) has observed, "The con-
prior to data collection. The goal of those cept of usefulness . . . depends upon the
evaluations was to produce findings, then perspective and values of the observer. This
they'd worry about how to use whatever means that one person's usefulness may be
was found. Findings would determine use, another person's waste" (p. 155). To help
so until findings were generated, no real intended users deliberate on and commit to
attention was paid to use. intended uses, evaluators need a menu of
Utilization-focused evaluators, in con- potential uses to offer. Utilization-focused
trast, work with intended users to deter- evaluation is a menu-oriented approach.
mine priority uses early in the evaluation It's a process for matching intended uses
process. The agreed-on, intended uses then and intended users. Here, then, is a menu
become the basis for subsequent design de- of three different evaluation purposes based
cisions. This increases the likelihood that on varying uses for evaluation findings. In
an evaluation will have the desired impact. the next chapter, we'll add to this menu a
Specifying intended uses is evaluation's variety of uses of evaluation processes.
equivalent of program goal setting.

Three Uses of Findings

T he purpose of an evaluation conditions the use that can be expected of it.

—Eleanor Chelimsky (1997)

You don't get very far in studying evalu- polished evaluation reports versus oral
ation before realizing that the field is char- briefings and discussions where no written
acterized by enormous diversity. From report is ever generated. Then there are
large-scale, long-term, international com- combinations and permutations of these
parative designs costing millions of dollars contrasting approaches. The annual meet-
to small, short evaluations of a single com- ings of the American Evaluation Associa-
ponent in a local agency, the variety is vast. tion, the Canadian Evaluation Society, and
Contrasts include internal versus external the Australasian Evaluation Society offer an
evaluation; outcomes versus process eval- awesome cornucopia of variations in evalu-
uation; experimental designs versus case ation practice (and ongoing debate about
studies; mandated accountability systems which approaches are really evaluation). In
versus voluntary management efforts; aca- the midst of such splendid diversity, any
demic studies versus informal action re- effort to reduce the complexity of evalu-
search by program staff; and published, ation options to a few major categories will
Intended Uses of Findings • 65

inevitably oversimplify. Yet, some degree of Judgment-Oriented Evaluation

simplification is needed to make the evalu-
ation design process manageable. So let us Evaluations aimed at determining the
attempt to heed Thoreau's advice: overall merit, worth, or value of something
are judgment-oriented. Merit refers to the
intrinsic value of a program, for example,
Simplicity, simplicity, simplicity! I say, let
how effective it is in meeting the needs of
your affairs be as two or three, and not a
those it is intended to help. Worth refers to
hundred or a thousand. (Walden, 1854)
extrinsic value to those outside the pro-
gram, for example, to the larger commu-
nity or society. A welfare program that gets
A Menu for Using Findings: jobs for recipients has merit for those who
Making Overall Judgments, move out of poverty and worth to society
Facilitating Improvements, by reducing welfare costs. Judgment-
and Generating Knowledge oriented evaluation approaches include
performance measurement for public ac-
Evaluation findings can serve three countability; program audits; summative
primary purposes: rendering judgments, ' evaluations aimed at deciding if a program
facilitating improvements, and/or generat- is sufficiently effective to be continued or
ing knowledge. Chelimsky (1997) distin- replicated; quality control and compliance
guishes these three purposes by the per- reports; and comparative ratings or rank-
spective that undergirds them: judgments ings of programs a la Consumer Reports.
are undergirded by the accountability per- The first clue that intended users are
spective; improvements are informed by a seeking an overall judgment is when you
developmental perspective; and generation hear the following kinds of questions: Did
knowledge operates from the knowledge the program work? Did it attain its goals?
perspective of academic values. These are Should the program be continued or
by no means inherently conflicting pur- ended? Was implementation in compliance
poses, and some evaluations strive to incor- with funding mandates? Were funds used
porate all three approaches, but, in my appropriately for the intended purposes?
experience, one is likely to become the Were desired client outcomes achieved?
dominant motif and prevail as the primary Answering these kinds of evaluative ques-
purpose informing design decisions and tions requires a data-based judgment that
priority uses; or else, different aspects of an some need has been met, some goal at-
evaluation are designed, compartmental- tained, or some standard achieved.
ized, and sequenced to address these con- Another clue that rendering judgment
trasting purposes. I also, find that confusion will be a priority is lots of talk about "ac-
among these quite different purposes, or countability." Funders and politicians like
failure to prioritize them, is often the to issue calls for accountability (notably for
source of problems and misunderstandings others, not for themselves), and managing
along the way and can become disastrous for accountability has become a rallying cry
at the end when it turns out that different in both private and public sectors (Kearns
intended users had different expectations 1996). Program and financial audits are
and priorities. I shall discuss each, offering aimed at ensuring compliance with in-
variations and examples. tended purposes and mandated proce-

dures. The program evaluation units of their colleges to spearhead curriculum re-
legislative audit offices, offices of comp- form. There was lots of room for debate
trollers and inspectors, and federal agen- about the merit or worth of the program
cies like the General Accounting Office depending on one's values and priorities,
(GAO) and the Office of Management and but our evaluation found that the funds
Budget (OMB) have government oversight were spent in accordance with the agency's
responsibilities to make sure programs are innovative mandate and many, though not
properly implemented and effective. The all, participants followed through on the
U.S. Government Performance and Results project's goal of providing leadership for
Act of 1993 required annual performance educational change. The funding agency
measurement to "justify" program deci- found sufficient merit and worth that the
sions and budgets. Political leaders in Can- project was awarded a year-long dissemina-
ada, the United Kingdom, and Australia tion grant.
have been active and vocal in attempting to In judgment-oriented evaluations, speci-
link performance measurement to budget- fying the criteria for judgment is central
ing for purposes of accountability (Auditor and critical. Different stakeholders will
General of Canada 1993), and these efforts bring different criteria to the table (or apply
greatly influenced the U.S. federal ap- them without coming to the table, as in
proach to accountability (Breul 1994). Proxmire's case). The funding agency's cri-
Rhetoric about accountability can be- terion was whether participants developed
come particularly strident in the heat of personally and professionally in ways that
political campaigns. Everyone campaigns led them to subsequently exercise inno-
against ineffectiveness, waste, and fraud. vative leadership in higher education;
Yet, one person's waste is another's jewel. Proxmire's criterion was whether, on the
For years, U.S. Senator William Proxmire surface, the project would sound wasteful
of Wisconsin periodically held press con- to the ordinary taxpayer.
ferences in which he announced Golden During design discussions and negotia-
Fleece Awards for government programs tions, evaluators may offer additional cri-
he considered especially wasteful. I had the teria for judgment beyond those initially
dubious honor of being the evaluator for thought of by intended users. As purpose
one such project ridiculed by Proxmire, a and design negotiations conclude, the
project to take higher education adminis- standard to be met by the evaluation has
trators into the wilderness to experience, been articulated in the Joint Committee
firsthand, experiential education. The pro- Program Evaluation Standards:
gram was easy to make fun of: Why should
taxpayer dollars be spent for college deans Values Identification: The perspectives, pro-
to hike in the woods? Outrageous! What cedures, and rationale used to interpret the
was left out of Proxmire's press release was findings should be carefully described, so
that the project, supported by the Fund for that the bases for value judgments are clear.
the Improvement of Postsecondary Educa- (Joint Committee 1994-.U4; emphasis added)
tion, had been selected in a competitive
process and funded because of its innova- Some criteria, such as fraud and gross
tive approach to rejuvenating burned-out incompetence, are sufficiently general and
and discouraged administrators, and that agreed-on that they may remain implicit
many of those administrators returned to as bases for value judgments when the
Intended Uses of Findings • 67

explicit focus is on goal attainment. Yet, for government programs are account-
finding criminal, highly unethical, or ability-driven. As we shall see, however,
grossly incompetent actions will quickly many evaluations of private sector pro-
overwhelm other effectiveness criteria. grams aimed at internal program im-
One of my favorite examples comes from provement have n o public accountability
an audit of a weatherization program in purpose. First, however, let's review sum-
Kansas as reported in the newsletter of motive evaluation as a major form of
Legislative Program Evaluators. judgment-oriented evaluation.
Summative evaluation constitutes an im-
Kansas auditors visited several homes that portant purpose distinction in any menu
had been weatherized. At one home, workers of intended uses. Summative evaluations
had installed 14 storm windows to cut down judge the overall effectiveness of a program
on air filtration in the house. However, one and are particularly important in making
could literally see through the house because decisions about continuing or terminating
some of the siding had rotted and either an experimental program or demonstra-
pulled away from or fallen off the house. The tion project. As such, summative evalu-
auditors also found that the agency had ations are often requested by funders.
nearly 200 extra storm windows in stock. Summative evaluation contrasts with for-
Part of the problem was that the supervisor mative evaluation, which focuses on ways
responsible for measuring storm windows of improving and enhancing programs
was afraid of heights; he would "eyeball" the rather than rendering definitive judgment
size of second-story windows from the about effectiveness. Michael Scriven (1967:
ground. . . . If these storm windows did not 40-43) introduced the summative-forma-
fit, he ordered new ones. (Hinton 1988:3) tive distinction in discussing evaluation of
educational curriculum. T h e distinction
The auditors also found fraud. T h e has since become a fundamental evalu-
program bought w i n d o w s at inflated ation typology.
prices from a company secretly o w n e d by With widespread use of the summative-
a program employee. A kickback scheme formative distinction has come misuse, so
was uncovered. " T h e w o r k m a n s h i p on it is worth examining Scriven's (1991a)
most homes was shoddy, bordering on own definition:
criminal. . . . [For e x a m p l e ] , workers in-
stalling a roof vent used an ax t o chop a Summative evaluation of a program (or
hole in the roof." Some 2 0 % of bene- other evaluand) is conducted after comple-
ficiaries didn't meet eligibility criteria. tion of the program (for ongoing programs
Findings like these are thankfully rare, but that means after stabilization) and for the
they grab headlines w h e n they become benefit of some external audience or decision-
public, and they illustrate why account- maker (for example, funding agency, over-
ability will remain a central purpose of sight office, historian, or future possible
many evaluations. users). . . . The decisions it services are most
The extent to which concerns about ac- often decisions between these options: ex-
countability dominate a specific study var- port (generalize), increase site support,
ies by the role of the evaluator. For audi- continue site support, continue with condi-
tors, accountability is always primary. tions (probationary status), continue with
Public reports on performance indicators modifications, discontinue.... The aim is to

report on it [the program], not to report to humanistic evaluation, and Total Quality
it. (p. 340) Management (TQM), among others. What
these approaches share is a focus on im-
Summative evaluation provides data to provement—making things better—rather
support a judgment about the program's than rendering summative judgment. Judg-
worth so that a decision can be made ment-oriented evaluation requires preordi-
about the merit of continuing the pro- nate, explicit criteria and values that form
gram. While Scriven's definition focuses the basis for judgment. Improvement-
on a single program, summative evalu- oriented approaches tend to be more open
ations of multiple programs occur when, ended, gathering varieties of data about
like the products in a Consumer Reports strengths and weaknesses with the expecta-
test, programs are ranked on a set of cri- tion that both will be found and each can
teria such as effectiveness, cost, sustain- be used to inform an ongoing cycle of re-
ability, quality characteristics, and so on. flection and innovation. Program manage-
Such data support judgments about the ment, staff, and sometimes participants
comparative merit or worth of different tend to be the primary users of improve-
programs. ment-oriented findings, while funders and
In judgment-oriented evaluations, what external decision makers tend to use judg-
Scriven (1980) has called "the logic of valu- mental evaluation, though I hasten to add
ing" rules. Four steps are necessary: (1) that these associations of particular catego-
select criteria of merit; (2) set standards of ries of users with specific types of evalu-
performance; (3) measure performance; ations represent utilization tendencies, not
and (4) synthesize results into a judgment definitional distinctions; any category of
of value (Shadish, Cook, and Leviton user may be involved in any kind of use.
1991:73, 83-94). This is clearly a deduc- Improvement-oriented evaluations ask
tive approach. In contrast, improvement- the following kinds of questions: What are
oriented evaluations often use an inductive the program's strengths and weaknesses?
approach in which criteria are less formal To what extent are participants progressing
as one searches openly for whatever areas toward the desired outcomes? Which types
of strengths or weaknesses may emerge of participants are making good progress
from looking at what's happening in the and which types aren't doing so well? What
program. kinds of implementation problems have
emerged and how are they being ad-
dressed? What's happening that wasn't ex-
Improvement- Oriented pected? How are staff and clients inter-
Evaluation acting? What are staff and participant
perceptions of the program? What do they
Using evaluation results to improve a like? dislike? want to change? What are
program turns out, in practice, to be fun- perceptions of the program's culture and
damentally different from rendering judg- climate? How are funds being used com-
ment about overall effectiveness, merit, or pared to initial expectations? How is the
worth. Improvement-oriented forms of program's external environment affecting
evaluation include formative evaluation, internal operations? Where can efficiencies
quality enhancement, responsive evalu- be realized? What new ideas are emerging
ation, learning organization approaches, that can be tried out and tested?
Intended Uses of Findings • 69

The flavor of these questions—their nu- ward desired outcomes. Improvement-

ances, intonation, feel—communicate im- oriented evaluation more generally, how-
provement rather than judgment. Bob ever, includes using information systems to
Stake's metaphor explaining the difference monitor program efforts and outcomes
between summative and formative evalu- regularly over time to provide feedback for
ation can be adapted more generally to the fine-tuning a well-established program.
distinction between judgmental evaluation That's how data are meant to be used as
and improvement-oriented evaluation: part of a Total Quality Management (TQM)
"When the cook tastes the soup, that's approach. It also includes the "decision-
formative; when the guests taste the soup, oriented educational research" of Cooley
that's summative" (quoted in Scriven and Bickel (1985), who built a classroom-
1991a:169). More .generally, anything based information system aimed at "moni-
done to the soup during preparation in the toring and tailoring." For example, by sys-
kitchen is improvement oriented; when the tematically tracking daily attendance
soup is served, judgment is rendered, in- patterns for individuals, classrooms, and
cluding judgment rendered by the cook schools, educational administrators could
that the soup was ready for serving (or at quickly identify attendance problems and
least that preparation time had run out). intervene before the problems became
The metaphor also helps illustrate that chronic or overwhelming. Attendance could
one must be careful to stay focused on also be treated as an early warning indi-
intent rather than activities when differen- cator of other potential problems.
tiating purposes. Suppose that those to
Again, I want to reiterate that we are
whom the soup is served are also cooks, and
focusing on distinctions about intended
the purpose of their tasting the soup is to
and actual use of findings. A management
offer additional recipe ideas and consider
information system that routinely collects
potential variations in seasoning. Then the
fact that the soup has moved from kitchen data can be used for monitoring progress
to table does not mean a change in purpose. and reallocating resources for increased
Improvement remains the primary agenda. effectiveness in a changing environment;
Final judgment awaits another day, a dif- that's improvement-oriented use. How-
ferent serving—unless, of course, the col- ever, if that same system is used for public
lection of cooks suddenly decides that the accountability reporting, that's judgment-
soup as served to them is already perfect oriented use. These contrasting purposes
and no further changes should be made. often come into conflict because the infor-
Then what was supposed to be formative mation needed for management is different
would suddenly have turned out to be sum- from the data needed for accountability; or
mative. And thus are purposes and uses knowing that the data will be used for
often confounded in real-world evaluation accountability purposes, the system is set
practice. up and managed for that purpose and be-
Formative evaluation typically connotes comes useless for ongoing improvement-
collecting data for a specific period of time, oriented decision making. Exhibit 4.1
usually during the start-up or pilot phase of provides an example of how formative
a project, to improve implementation, evaluation can prepare a program for sum-
solve unanticipated problems, and make mative evaluation by connecting these
sure that participants are progressing to- separate and distinct evaluation purposes

Formative and Summative Evaluation of the Saint Paul Technology
for Literacy Center (TLC): A Utilization-Focused Model

TLC was established as a three-year demonstration project to pilot-test the effectiveness of an

innovative, computer-based approach to adult literacy. The pilot project was funded by six Minnesota
Foundations and the Saint Paul Schools at a cost of $1.3 million. The primary intended users of the
evaluation were the school superintendent, senior school officials, and School Board Directors, who
would determine whether to continue and integrate the project into the district's ongoing community
education program. School officials and foundation donors participated actively in designing the
evaluation. The evaluation cost $70,300.
After 16 months of formative evaluation, the summative evaluation began. The formative
evaluation, conducted by an evaluator hired to be part of the TLC staff, used extensive learner
feedback, careful documentation of participation and progress, and staff development activities to
specify the TLC model and bring implementation to a point of stability and clarity where it could be
summatively evaluated. The summative evaluation, conducted by two independent University of
Minnesota social scientists, was planned as the formative evaluation was being conducted.
The summative evaluation began by validating that the specified model was, in fact, being
implemented as specified. This involved interviews with staff and students and observations of the
program in operation. Outcomes were measured using the Test of Adult Basic Education administered
on a pre-post basis to participant and control groups. The test scores were analyzed for all students
who participated in the program for a three month period. Results were compared to data available
on other adult literacy programs. An extensive cost analysis was also conducted by a university
educational economist. The report was completed six months prior to the end of the demonstration,
in time for decision makers to use the results to determine the future of the program. Retention and
attrition data were also analyzed and compared with programs nationally.

to separate and distinct stages in the pro- summative decisions or program improve-
gram's development. ments, but they contribute, often substan-
tially, if a utilization-focused approach is
used at the design stage.
Knowledge-Oriented Evaluation Conceptual use of findings, on the other
hand, contrasts with instrumental use in
Both judgment-oriented and improve- that no decision or action is expected;
ment-oriented evaluations involve the in- rather, it "is the use of evaluations to influ-
strumental use of results (Leviton and ence thinking about issues in a general way"
Hughes 1981). Instrumental use occurs (Rossi and Freeman 1985:388). The evalu-
when a decision or action follows, at least ation findings contribute by increasing
in part, from the evaluation. Evaluations knowledge. This knowledge can be as spe-
are seldom the sole basis for subsequent cific as clarifying a program's model, test-
Intended Uses of Findings • 71

Comparisons showed significant gains in reading comprehension and math for the participant
group versus no gains for the control group. Adult learners in the program advanced an average of
one grade level on the test for every 52.5 hours spent in TLC computer instruction. However, the report
cautioned that the results showed great variation: high standard deviations, significant differences
between means and medians, ranges of data that included bizarre extremes, and very little correlation
between hours spent and progress made. The report concluded: "Each case is relatively unique. TLC
has created a highly individualized program where learners can proceed at their own pace based on
their own needs and interests. The students come in at very different levels and make very different
gains during their TLC w o r k . . . , thus the tremendous variation in progress" (Council on Foundations
Several years after the evaluation, the Council on Foundations commissioned a follow-up study
on the evaluation's utility. The Saint Paul Public Schools moved the project from pilot to permanent
status. The Superintendent of Schools reported that "the findings of the evaluation and the qualities
of the services it had displayed had irrevocably changed the manner in which adult literacy will be
addressed throughout the Saint Paul Public Schools" (Council on Foundations, 1993:148). TLC also
became the basis for the District's new Five-Year Plan for Adult Literacy. The evaluation was so
well-received by its original philanthropic donors that it led the Saint Paul Foundation to begin and
support an Evaluation Fellows program with the University of Minnesota. The independent Council on
Foundations follow-up study concluded: "Everyone involved in the evaluation—TLC, funding sources,
and evaluators—regards it as a "utilization-focused evaluation The organization and its founders
and funders decided what they wanted to learn and instructed the evaluators accordingly" (Council
on Foundations, 1993:154-55). The formative evaluation was used extensively to develop the program
and get it ready for the summative evaluation. The summative evaluation was then used by primary
intended users to inform a major decision about the future of computer-based adult literacy. Ten
years later, Saint Paul's adult literacy effort continues to be led by TLC's original developer and

SOURCES: Turner and Stockdill 1987; Council on Foundations (1993:129-55).

ing theory, distinguishing types of interven- denigrated. In recent years, they have come
tions, figuring out how to measure out- to be more appreciated and valued (Weiss
comes, generating lessons learned, and/or 1990:177).
elaborating policy options. In other cases, We found conceptual use to be wide-
conceptual use is more vague, with users spread in our follow-up study of federal
seeking to understand the program better; health evaluations. As one project manager
the findings, then, may reduce uncertainty, reported:
offer illumination, enlighten funders and
staff about what participants really experi- The evaluation led us to redefine some target
ence, enhance communications, and facili- populations and rethink the ways we con-
tate sharing of perceptions. In early studies nected various services. This rethinking hap-
of utilization, such uses were overlooked or pened over a period of months as we got a

better perspective on what the findings or ongoing improvement, the connection

meant. But we didn't so much change what to social science theory tends to focus on
we were doing as we changed how we increasing knowledge about h o w effec-
thought about what we were doing. That has tive programs w o r k in general. For ex-
had big pay-offs over time. We're just a lot ample, Shadish (1987) has argued that
clearer now. [DM248:19] the understandings gleaned from evalu-
ations ought to contribute to macrotheo-
This represents an example of concep- ries about " h o w to p r o d u c e i m p o r t a n t
tual use that is sometimes described as social change" (p. 94). Scheirer (1987)
enlightenment. Carol Weiss (1990) has has contended that evaluators ought t o
used this t e r m to describe the effects of draw on and contribute to implementa-
evaluation findings being disseminated to tion theory to better understand the "what
the larger policy c o m m u n i t y "where they and why of program delivery" (p. 59).
have a chance to affect the terms of de- Such knowledge-gene*rating efforts focus
bate, the language in which it is con- beyond the effectiveness of a particular
ducted, and the ideas that are considered program to future p r o g r a m designs and
relevant in its resolution" (p. 176). She policy formulation in general.
As the field of evaluation has matured
and a vast number of evaluations has accu-
Generalizations from evaluation can perco- mulated, the opportunity has arisen to look
late into the stock of knowledge that partici- across findings about specific programs to
pants draw on. Empirical research has con- formulate generalizations about effective-
firmed this. . . . Decision makers indicate a
ness. This involves synthesizing findings
strong belief that they are influenced by the
from different studies. (It is important to
ideas and arguments that have their origins
distinguish this form of synthesis evalu-
in research and evaluation. Case studies of
ation, that is, synthesizing across different
evaluations and decisions tend to show that
studies, from what Scriven [1994] calls "the
generalizations and ideas that come from
final synthesis," which refers to sorting out
research and evaluation, help shape the de-
and weighing the findings in a single study
velopment of policy. The phenomenon has
to reach a summative judgment.) Cross-
come to be known as "enlightenment" . . . ,
study syntheses have become an important
an engaging idea. The image of evaluation as
contribution of the GAO (1992c) in pro-
increasing the wattage of light in the policy
viding accumulated wisdom to Congress
arena brings joy to the hearts of evaluators.
about h o w to formulate effective policies
(pp. 176-77)
and programs. An example is GAO's
(1992b) report on Adolescent Drug Use
While Weiss has emphasized the infor-
Prevention: Common Features of Promising
mal ways in which evaluation findings
Community Programs. (See Exhibit 4.2.)
provide, over time, a knowledge base for
policy, C h e n (1990, 1 9 8 9 ; Chen and An excellent and important example of
Rossi 1987) has focused on a more formal synthesis evaluation is Lisbeth Schorr's
knowledge-oriented approach in what he (1988) Within Our Reach, a study of pro-
has called theory-driven evaluation. While grams aimed at breaking the cycle of pov-
theory-driven evaluations can provide erty. She identified "the lessons of success-
program models for summative judgment ful programs" as follows (pp. 256-83):
Intended Uses of Findings • 73

An Example of a Knowledge-Oriented Evaluation

The U.S. General Accounting Office (GAO 1992a) identified "Common Features of Promising
Community Programs" engaged in adolescent drug use prevention. The evaluation was aimed
at enlightening policymakers, in contrast to other possible uses of findings, namely, judging the
effectiveness of or improving specific programs.

Six features associated with high levels of participant enthusiasm and attachment:

1. a comprehensive strategy
2. an indirect approach to drug abuse prevention
3. the goal of empowering youth
4. a participatory approach
5. a culturally sensitive orientation
6. highly structured activities

Six common program problems:

1. maintaining continuity with their participants

2. coordinating and integrating their service components
3. providing accessible services
4. obtaining funds
5. attracting necessary leadership and staff
6. conducting evaluation

• offering a broad spectrum of services; • professionals redefining their roles to re-

• regularly crossing traditional professional spond to severe needs; and
and bureaucratic boundaries, that is, organ- • overall, intensive, comprehensive, respon-
izational flexibility; sive and flexible programming.
• seeing the child in the context of family and
the family in the context of its surroundings, These kinds of "lessons" constitute ac-
that is, holistic approaches; cumulated wisdom—principles of effec-
• coherent and easy-to-use services; tiveness or "best practices"—that can be
• committed, caring, results-oriented staff; adapted, indeed, must be adapted, to spe-
• finding ways to adapt or circumvent tradi- cific programs, or even entire organiza-
tional professional and bureaucratic limita- tions (Wray and H a u e r 1996). For exam-
tions to meet client needs; ple, the Ford Foundation commissioned


an evaluation of its Leadership Program handicaps; they need doctoring, rehabilita-

for C o m m u n i t y F o u n d a t i o n s . This study tion, and fixing of the kind that profes-
of 2 7 c o m m u n i t y foundations over five sionalized services are intended to provide.
years led to a guide for Building Commu- The assets model holds that even the most
nity Capacity (Mayer 1996, 1994, n.d.) distressed person or community has strengths,
that incorporates lessons learned and gen- abilities, and capacities; with investment,
eralizable development strategies for their strengths, abilities, and capacities can
c o m m u n i t y foundations—a distinguished increase. This view is only barely allowed to
and useful example of a knowledge- exist in the independent sector, where or-
generating evaluation. O t h e r examples ganizations are made to compete for funds
include a special evaluation issue of Mar- on the basis of "needs" rather than on the
riage and Family Review devoted to "Ex- basis of "can-do."
emplary Social Intervention Programs" The deficit model—seeing the glass half
( G u t t m a n and Sussman 1995) and a spe- empty—is a barrier to effectiveness in the
cial issue of The Future of Children (CFC independent sector. (Mayer 1993:7-8)
1995) devoted to "Long-Term O u t c o m e s
of Early C h i l d h o o d Programs." T h e McKnight F o u n d a t i o n cluster
In the philanthropic world, a related evaluation and the I n d e p e n d e n t Sector
approach has come to be called cluster study reached similar conclusions concur-
evaluation (Millett 1996; Council on Foun- rently and independently. Such gener-
dations 1993:232-51). A cluster evaluation alizable evaluation findings about princi-
team visits a number of different grantee ples of effective p r o g r a m m i n g have
projects with a similar focus (e.g., grass- become the knowledge base of our profes-
roots leadership development) and draws sion. Being knowledgeable about patterns
on individual grant evaluations to identify of program effectiveness allows evalua-
patterns across and lessons from the whole tors to provide guidance about develop-
cluster (Campbell 1994; Sanders 1994; ment of new initiatives, policies, and
Worthen 1994; Barley and Jenness 1 9 9 3 ; strategies for implementation. Such con-
Kellogg Foundation n.d.). T h e McKnight tributions constitute the conceptual use of
Foundation commissioned a cluster evalu- evaluation findings. Efforts of this kind
ation of 34 separate grants aimed at aid- may be considered research rather than
ing families in poverty. O n e lesson learned evaluation, but such research is ultimately
was that "effective programs have devel- evaluative in nature and i m p o r t a n t to the
oped processes and strategies for learning profession.
about the strengths as well as the needs Synthesis evaluations also help us gener-
of families in poverty" (Patton et al.
ate knowledge about conducting useful
1993:10). This lesson takes on added
evaluations. The premises of utilization-
meaning w h e n connected with the finding
focused evaluation featured in this book
of Independent Sector's review of " C o m -
originally emerged from studying 2 0 fed-
m o n Barriers to Effectiveness in the Inde-
eral evaluations (Patton, Grimes, et al.
pendent Sector":
1977). Those premises have been affirmed
by Alkin et al. (1979) in the model of evalu-
The deficits model holds that distressed peo- ation use they developed by analyzing
ple and communities are "needy"; they're a evaluations from different education dis-
collection of problems, pathologies, and tricts in California and by Wargo (1989) in
Intended Uses of Findings • 75

his "characteristics of successful program and making more diverse resources avail-
evaluations" identified by studying three able to students.
"unusually successful evaluations of na- The laboratory (NWREL 1977) pro-
tional food and nutrition programs" posed an advocacy-adversary model for
(p. 71). The Council on Foundations com- summative evaluation. Two teams were
missioned a synthesis evaluation based on created; by coin toss, one was designated
nine case studies of major foundation eval- the advocacy, the other the adversary team.
uations to learn lessons about "effective The task of the advocacy team was to
evaluating." (A summary of one of those gather and present data supporting the
case studies is presented as Exhibit 4.1 in proposition that Hawaii's 3-on-2 Program
this chapter.) Among the Council's 35 key was effective and ought to be continued.
lessons learned is this utilization-focused The adversaries were charged with mar-
evaluation premise: "Key 6. Make sure the shalling all possible evidence demon-
people who can make the most use of the strating that the program ought to be
evaluation are involved as stakeholders in terminated.
planning and carrying out the evaluation" The advocacy-adversary model was a
(Council on Foundations 1993:255). combination debate/courtroom approach
to evaluation (Wolf 1975; Kourilsky 1974;
Owens 1973). I became involved as a re-
Applying Purpose source consultant on fieldwork as the two
and Use Distinctions teams were about to begin site visits to
observe classrooms. When I arrived on the
By definition, the three kinds of uses scene, I immediately felt the exhilaration of
we've examined—making overall judg- the competition. I wrote in my journal,
ments, offering formative improvements,
or generating generic knowledge-Hran be No longer staid academic scholars, these are
distinguished clearly. Menu 4.1 presents athletes in a contest that will reveal who is
these three uses with examples of each. best; these are lawyers prepared to use what-
Although conceptually distinct in practice, ever means necessary to win their case. The
these uses can become entangled. Let me teams have become openly secretive about
illustrate with an evaluation of an innova- their respective strategies. These are experi-
tive educational program. enced evaluators engaged in a battle not only
Some years ago, the Northwest Regional of data, but also of wits. The prospects are
Educational Laboratory (NWREL) con- intriguing.
tracted with the Hawaii State Department
of Education to evaluate Hawaii's experi- As the two teams prepared their final
mental "3-on-2 Program," a team teaching reports, a concern emerged among some
approach in which three teachers worked about the narrow focus of the evalua-
with two regular classrooms of primary-age tion. The summative question concerned
children, often in multi-age groupings. whether the Hawaii 3-on-2 program
Walls between classrooms were removed so should be continued or terminated. Some
that three teachers and 40 to 60 children team members also wanted to offer find-
shared one large space. The program was ings about how to change the program or
aimed at creating greater individualization, how to make it better without terminating
increasing cooperation among teachers, it. Was it possible that a great amount of

1 MKJNU4. 1 1

Three Primary Uses of Evaluation Findings

Uses Examples
Judge merit or worth Summative evaluation
Quality control
Cost-benefit decisions
Decide a program's future
Improve programs Formative evaluation
Identify strengths and weaknesses
Continuous improvement
Quality enhancement
Being a learning organization
Manage more effectively
Adapt a model locally
Generate knowledge Generalizations about effectiveness
Extrapolate principles about what works
Theory building
Synthesize patterns across programs
Scholarly publishing
Policy making

NOTE: Menu 5.1 (Chapter 5) presents a corresponding menu, "Four Primary Uses of Evaluation Logic and
Processes," which includes Enhancing shared understandings, Reinforcing program interventions,
Engagement (participatory and empowerment evaluation), and Developmental evaluation. Menu 5.1 presents
uses where the impact on the program comes primarily from application of evaluation thinking and engaging
in an evaluation process in contrast to impacts that come from using the content of evaluation findings, the
focus of this menu.

time, effort, and money was directed at political, of the Hawaii 3-on-2 Program, we
answering the wrong question? Two par- realized that Hawaii's decision makers
ticipating evaluators summarized the di- should not be forced to deal with a simple
lemma in their published post mortem of save-it-or-scrap-it choice. Middle ground
the project: positions were more sensible. Half-way
measures, in this instance, probably made
As we became more and more conversant more sense. But there we were, obliged to do
with the intricacies, both educational and battle with our adversary colleagues on the
Intended Uses of Findings • 77

unembellished question of whether to main- nated. "And that will be very interesting,"
tain or terminate the 3-on-2 Program. he agreed. "But afterwards we trust you
(Popham and Carlson 1977:5) will give us answers to our practical ques-
tions, like how to reduce the size of the
In the course of doing fieldwork, the program, make it more cost effective, and
evaluators had encountered many stake- increase its overall quality."
holders who favored a formative evalu- Despite such formative concerns from
ation approach. These potential users some stakeholders, the evaluation pro-
wanted an assessment of strengths and ceeded as originally planned with the focus
weaknesses with ideas for improvement. on the summative evaluation question. But
Many doubted that the program, given its was that the right focus? The evaluation
popularity, could be terminated. They proposal clearly identified the primary in-
recognized that changes were needed, es- tended users as state legislators, members
pecially cost reductions, but that fell in the of the State Board of Education, and the
realm of formative not summative evalu- superintendent. In a follow-up survey of
ation. I had a conversation with one edu- those education officials (Wright and Sa-
cational policymaker that highlighted the chse 1977), most reported that they got the
dilemma about appropriate focus. He em- information they wanted. But the most im-
phasized that, with a high rate of inflation, portant evidence that the evaluation fo-
a declining school-age population, and cused on the right question came from
reduced federal aid, the program was too actions taken following the evaluation
expensive to maintain. "That makes it when the decision makers decided to elimi-
sound like you've already made the deci- nate the program.
sion to terminate the program before the After it was all over, I had occasion to
evaluation is completed," I suggested. ask Dean Nafziger, who had directed the
"Oh, no!" he protested. "All we've de- evaluation as director of evaluation, re-
cided is that the program has to be changed. search, and assessment for NWREL,
In some schools the program has been very whether a shift to a formative focus would
successful and effective. Teachers like it; have been appropriate. He replied,
parents like it; principals like it. How could
we terminate such a program? But in other We maintained attention to the information
schools it hasn't worked very well. The needs of the true decision makers, and ad-
two-classroom space has been redivided hered to those needs in the face of occasional
into what is essentially three self-contained counter positions by other evaluation audi-
classrooms. We know that. It's the kind of ences. . . . If a lesson is to be learned it is this:
program that has some strong political op- an evaluator must determine who is making
position and some strong political support. the decisions and keep the information
So there's no question of terminating the needed by the decision makers as the highest
program and no question of keeping it the priority. In the case of the Hawaii "3 on 2"
same." evaluation, the presentation of program im-
I felt compelled to point out that the provement information would have served
evaluation was focused entirely on whether to muddle the decision-making process. (Per-
the program should be continued or termi- sonal correspondence 1979)

Choosing Among Alternatives rectly with the board of directors. The

evaluators insisted on interacting directly
As the Hawaii case illustrates, the forma- with these primary users to lay the ground-
tive-summative distinction can be critical. work for genuinely summative decision
Formative and summative evaluations in- making. Senior staff decided that no sum-
volve significantly different research foci. mative decision was imminent, so the
The same data seldom serve both purposes evaluation continued in a formative mode,
well. Nor will either a specific formative or and the design was changed accordingly. As
summative evaluation necessarily yield ge- a matter of ethics, the evaluators made sure
neric knowledge (lessons learned) that can that the chair of the board was involved in
be applied to effective programming more these negotiations and that the board
generally. It is thus important to identify agreed to the change in focus. There really
the primary purpose of the evaluation at was no summative decision on the horizon
the outset: overall judgment of merit or because the foundation had a long-term
worth, ongoing improvement, or knowl- commitment to the leadership program.
edge generation? Other decisions about Now, consider a different case, the
what to do in the evaluation can then be evaluation of an innovative school, the Sat-
made in accordance with how best to sup- urn School, in Saint Paul, Minnesota.
port that primary purpose. One frequent Again, the original evaluation design
reaction to posing evaluation alternatives called for three years of formative evalu-
is: "We want to do it all." A comprehensive ation followed by two, final years with a
evaluation, conducted over time and at summative focus. The formative evaluation
different levels, may include all three uses, revealed some developmental problems,
but for any given evaluation activity, or any including lower than desired scores on
particular stage of evaluation, it's critical district-mandated standardized tests. The
to have clarity about the priority use of formative evaluation report, meant only
findings. for internal discussion aimed at program
Consider the evaluation of a leadership improvement, got into the newspapers
program run by a private philanthropic with glaring headlines about problems and
foundation. The original evaluation con- low test scores. The evaluation's visibility
tract called for three years of formative and public reporting put pressure on senior
evaluation followed by two years of sum- district officials to make summative deci-
mative evaluation. The program staff and sions about the program, despite earlier
evaluators agreed that the formative evalu- assurances that the program would have a
ation would be for staff and participant use; full five years before such decisions were
however, the summative evaluation would made. The formative evaluation essentially
be addressed to the foundation's board of became summative when it hit the newspa-
directors. The formative evaluation helped pers, much to the chagrin of staff.
shape the curriculum, brought focus to in- Sometimes, however, program staff like
tended outcomes, and became the basis for such a reversal of intended use when, for
the redesign of follow-up activities and example, evaluators produce a formative
workshops. As time came to make the tran- report that is largely positive and staff want
sition from formative to summative evalu- to disseminate the results as if they were
ation, the foundation's president got cold summative, even though the methods of
feet about having the evaluators meet di- the formative evaluation were aimed only
Intended Uses of Findings • 79

at capturing initial perceptions of program Evaluation Use and

progress, not at rendering an overall judg- Decision Making: Being
ment of merit or worth. Keeping formative Realistic About Impact
evaluations formative, and summative
evaluations summative, is an ongoing chal- All three uses of evaluation findings—to
lenge, not a one-time decision. When con- render judgment, to improve programs,
textual conditions merit or mandate a shift and to generate knowledge—support deci-
in focus, evaluators need to work with sion making. The three kinds of decisions,
intended users to fully understand the con- however, are quite different.
sequences of such a change. We'll discuss
these issues again in the chapter on situ-
ational responsiveness and evaluator roles. 1. Rendering judgment about overall merit or
Let me close this section with one final worth for summative purposes supports de-
example. cisions about whether a program should be
continued, enlarged, disseminated, or termi-
A national foundation funded a cluster
nated—all major decisions.
evaluation in which a team of evaluators
2. Decisions about how to improve a program
would assemble data from some 30 differ-
tend to be made in small, incremental steps
ent projects and identify lessons for effec-
based on specific evaluation findings aimed
tive community-based health programming
purposefully at instrumental use.
—essentially a knowledge-generating eval-
3. Policy decisions informed by cumulative
uation. The cluster evaluation team had no
knowledge from evaluations imply a weak
responsibility to gather data to improve
and diffuse connection between specific
specific programs nor make summative
evaluation findings and the eventual deci-
judgments. Each separate project had its
sion made—thus the term enlightenment
own evaluation for those purposes. The
cluster evaluation was intended to look for
patterns of effectiveness (and barriers to
Trying to sort out the influence of
same) across projects. Yet, during site visits,
evaluations on decisions has been a major
individual projects provided cluster evalua- focus of researchers studying use. Much
tors with a great deal of formative feedback of the early literature on program evalu-
that they wanted communicated to the ation defined use as immediate, concrete,
foundation, and individual grantees were and observable influence on specific deci-
hungry for feedback and comparative in- sions and program activities resulting di-
sights about how well they were doing and rectly from evaluation findings. For exam-
ways they might improve. As the evaluation ple, Carol Weiss (1972c), one of the
approached time for a final report, senior pioneers in studying use, stated, "Evalu-
foundation officials and trustees asked for ation research is meant for immediate and
summative conclusions about the overall direct use in improving the quality of so-
effectiveness of the entire program area as cial programming" (p. 10). It was with
part of rethinking funding priorities and reference to immediate and direct use that
strategies. Thus, a knowledge-generating Weiss was speaking when she concluded
evaluation got caught up in pressures to that "a review of evaluation experience
adapt to meet demands for both formative suggests that evaluation results have gen-
and summative uses. erally not exerted significant influence on

program decisions" (p. 11). Weiss (1988, this evaluation study on the program we've
1990) reaffirmed this conclusion in her been discussing?
1987 keynote address at the American
Evaluation Association: "The influence of
After coding responses for the nature and
evaluation on program decisions has not
degree of impact (Patton 1986:33), we
noticeably increased" (p. 7). The evalu-
found that 78% of responding decision
ation literature reviewed in the first chap-
makers and 90% of responding evaluators
ter was likewise overwhelming in con-
felt that the evaluation had an impact on
cluding that evaluation studies exert little
the program. We asked a follow-up ques-
influence in decision making.
tion about the nonprogram impacts of the
It was in this gloomy context that I set evaluations:
out with a group of students in search of
evaluations that had actually been used to
help us identify factors that might enhance We've been focusing mainly on the study's
use in the future. (Details about this follow- impact on the program itself. Sometimes
up study of the use of federal health evalu- studies have a broader impact on things be-
ations were presented in Chapter 3 and in yond an immediate program, things like
Patton, Grimes, et al. 1977). Given the pes- general thinking on issues that arise from a
simistic picture of most writings on use, we study, or position papers, or legislation. To
began our study fully expecting our major what extent and in what ways did this evalu-
ation have an impact on any of these .kinds
problem would be finding even one evalu-
of things?
ation that had had a significant impact on
program decisions. What we found was
considerably more complex and less dismal We found that 80% of responding decision
than our original impressions had led us to makers and 70% of responding evaluators
expect. Our results provide guidance in felt these specific evaluation studies had
how to work with intended users to set identifiable nonprogram impacts.
realistic expectations about how much in- The positive responses to the questions
fluence an evaluation will have. on impact are quite striking considering the
predominance of the impression of nonuse
in the evaluation literature. The main dif-
ference here, however, was that the actual
Views From the Field
participants in each specific evaluation
on Evaluation Impact
process were asked to define impact in
terms that were meaningful to them and
Our major question on use was as their situations. None of the evaluations we
follows: studied led directly and immediately to the
making of a major, concrete program deci-
We'd like to focus on the actual impact of this sion. The more typical impact was one in
evaluation study . . . , to get at any ways in which the evaluation provided additional
which the study may have had an impact—an pieces of information in the difficult puzzle
impact on program operations, on planning, of program action, permitting some reduc-
on funding, on policy, on decisions, on think- tion in the uncertainty within which any
ing about the program, and so forth. From decision maker inevitably operates. In most
your point of view, what was the impact of such cases, though the use was modest,
Intended Uses of Findings • 81

those involved considered the evaluation He went on to say that, following the
worthwhile. evaluation:
The most dramatic example of use re-
ported in our sample was evaluation of a We changed our whole functional approach
pilot program. The program administrator to looking at the identification of what we
had been favorable to the program in prin- should be working on. But again I have a
ciple, was uncertain what the evaluation hard time because these things, none of these
results would be, but was "hoping the re- things occurred overnight, and in an evolu-
sults would be positive." The evaluation tionary process it's hard to say, you know, at
proved to be negative. The administrator what point it made a significant difference or
was "surprised, but not alarmingly so. . . . did it merely verify and strengthen the resolve
We had expected a more positive finding or that you already had. [DM232:17]
we would not have engaged in the pilot
studies" [DM367:13]. The program was As in this example of conceptual use,
subsequently ended, with the evaluation respondents frequently had difficulty as-
carrying "about a third of the weight of the sessing the degree to which an evaluation
total decision" [DM367:8]. Thus, the actually affected decisions made after
evaluation served a summative purpose but completion of the evaluation. This was
was one of only several factors (politics, true, for example, in the case of a large-
impressions already held, competing pri- scale evaluation conducted over several
orities and commitments) that influenced years at considerable cost. The findings
the decision. revealed some deficiencies in the program
Contrast such summative use with the but overall were quite positive. Changes
experiences of a different decision maker corresponding to those recommended in
we interviewed, one who had 29 years of the study occurred when the report was
experience in the federal government, published, but those changes could not be
much of that time directing research. He directly and simply attributed to the
reported the impact of the evaluation about evaluation:
which he was interviewed as follows:
A lot of studies like this confirmed what
close-by people knew and they were already
It served two purposes. One is that it re- taking actions before the findings. So you
solved a lot of doubts and confusions and can't link the finding to the action, that's just
misunderstandings that the advisory com- confirmation.... The direct link between the
mittee had . . . and the second was that it finding and the program decision is very dif-
gave me additional knowledge to support fuse. [DM361:12, 13]
facts that I already knew, and, as I say, broad-
ened the scope more than I realized. In other In essence, we found that evaluations
words, the perceptions of where the organi- provided some additional information
zation was going and what it was that was judged and used in the context of
accomplishing were a lot worse than I had other available information to help reduce
anticipated . . . but I was somewhat startled the unknowns in the making of incre-
to find out that they were worse, yet it wasn't mental program changes. The impact
very hard because it partly confirmed things ranged from "it sort of confirmed our
that I was observing. [DM232:17] impressions . . . , confirming some other

anecdotal information or impression that the point of a utilization-focused approach

we had" [DM209:7, 1 ] to providing a new is not to assume either high or low expec-
awareness that carried over to other tations. The point is to find out what the
programs. expectations of intended users are and ne-
This kind of conceptual use to stimulate gotiate a shared understanding of realistic,
thinking about what's going on and reduce intended use—a mutual commitment that
uncertainty emerged as highly important to can be met. In negotiating the nature and
decision makers. In some cases, it simply degree of evaluation use, that is, setting
made them more confident and deter- goals for the evaluation, it is important to
mined. On the other hand, where a need challenge intended users to be both opti-
for change was indicated, an evaluation mistic and realistic—the twin tensions in
study could help speed up the process of any goal-setting exercise. Whether the ex-
change or provide a new impetus for finally pected type and degree of use hoped for
getting things rolling. Reducing uncer- actually occurs can then be followed up as
tainty, speeding things up, and getting a way of evaluating the evaluation.
things finally started are real impacts—not In part, we need to distinguish a goals-
revolutionary—but real, important im- oriented, up-front definition of use from an
pacts in the opinion of the people we inter- after-the-fact, follow-up definition of use.
viewed. We found few major, direction- King and Pechman (1984, 1982) defined
changing decisions in most programs—few use as "intentional and serious considera-
really summative decisions. Rather, evalu- tion of evaluation information by an indi-
ation findings were used as one piece of vidual with the potential to act on it"
information that fed into a slow, evolu- (1984:244). This definition recognizes that
tionary process of program development. evaluation is only one influence among
Program development is typically a pro- many in the taking of an action or making
cess of "muddling through" (Allison 1971; of a decision; therefore, it is reasonable to
Lindblom 1965, 1959), and program eval- consider an evaluation used if it has been
uation is part of that muddling. Or, as Weiss seriously considered and the findings genu-
(1980) has observed, even major decisions inely taken into account. Such a definition
typically accrete gradually over time makes sense when evaluators are trying to
through small steps and minor adjustments study use after the fact and sort out relative
rather than getting decided all at once at influences. But the question utilization-
some single moment at the end of a careful, focused evaluation asks is: What are the
deliberative, and rational process. expected uses by intended users before and
The impacts of evaluation have most during the evaluation? Maybe serious con-
often been felt as ripples, not waves. The sideration is what they expect as well; but
question is whether such limited impact is maybe they expect more, or less.
sufficient to justify the costs of evaluation. Evaluators need to push intended users
The decision makers and evaluators we to be clear about what, if any, decisions are
interviewed 20 years ago were largely sat- expected to be influenced by an evaluation.
isfied with the type and degree of use they It is worth repeating that none of the fed-
experienced. But times have changed. The eral health decision makers we interviewed
stakes are higher. There's more sophistica- about evaluation use had been involved in
tion about evaluation and, I think, higher a utilization-focused process. That is, qone
expectations for accountability. However, of them had carefully considered how the
Intended Uses of Findings • 83

Questions to Ask of Intended Users to Establish an
Evaluation's Intended Influence on Forthcoming Decisions

What decisions, if any, are the evaluation findings expected to influence?

(There may not be any, in which case the evaluation's purpose may be simply to generate
knowledge for conceptual use and future enlightenment. If, however, the evaluation is
expected to influence decisions, clearly distinguish summative decisions about program
funding, continuation, or expansion from formative decisions about program improvement
and ongoing development.)

When will decisions be made? By whom? When, then, must the evaluation findings be presented to
be timely and influential?

What is at stake in the decisions? For whom? What controversies or issues surround the decisions?

What's the history and context of the decision-making process?

What other factors (values, politics, personalities, promises already made) will affect the decision
making? What might happen to make the decision irrelevant or keep it from being made? In other
words, how volatile is the decision making environment?

How much influence do you expect the evaluation to have—realistically?

To what extent has the outcome of the decision already been determined?

What data and findings are needed to support decision making?

What needs to be done to achieve that level of influence?

(Include special attention to which stakeholders to involve for the evaluation to have the
expected degree of influence.)

How will we know afterward if the evaluation was used as intended?

(In effect, how can use be measured?)

evaluation would be used in advance of ation expected to influence? What is at

data collection. My experiences in pushing stake? When will decisions be made? By
decision makers and intended users to be whom? What other factors (values, poli-
more intentional and prescient about tics, personalities, promises already made)
evaluation use during the design phase have will affect the decision making? How
taught me that it is possible to significantly much influence do you expect the evalu-
increase the degree of influence evaluations ation to have? What needs to be done to
have. Doing so, however, requires persis- achieve that level of influence? How will
tence in asking the following kinds of ques- we know afterward if the evaluation was
tions: What decisions, if any, is the evalu- used as intended? (In effect, how can use

be measured?) Exhibit 4.3 offers a number working at increased effectiveness, the

of questions to ask of intended users to evaluation should be framed to support
establish an evaluation's intended influence improvement-oriented decision making.
on forthcoming decisions. Skills in offering formative feedback and
creating an environment of mutual respect
and trust between the evaluator and staff
will be as important as actual findings.
Connecting Decisions to Uses
Where the intended users are more con-
Where the answers to the evaluator's cerned about generating knowledge for
questions indicate a major decision about formulating future programs than with
program merit, worth, continuation, ex- making decisions about current programs,
pansion, dissemination, and/or funding is then some form of synthesis or cluster
at stake, then the evaluation should be evaluation will be most appropriate to dis-
designed to render overall judgment— cover generic principles of effectiveness.
summative judgment. The design should be In helping intended users select from the
sufficiently rigorous and the data collected evaluation menu, and thereby focus the
should be sufficiently credible that a sum- evaluation, evaluators may encounter some
mative decision can be made. The findings reluctance to make a commitment. I
must be available in time to influence this worked with one director who proudly
kind of major decision. displayed this sign on his desk: "My deci-
Where the dialogue with primary in- sion is maybe—and that's final." Unfortu-
tended users indicates an interest in iden- nately, the sign was all too accurate. He
tifying strengths and weaknesses, clarify- wanted me to decide what kind of evalu-
ing the program's model, and generally ation should be done. After several frus-
Intended Uses of Findings • 85

trating attempts to narrow the evaluation's ful. Had I succumbed to the temptation to
focus, I presented what I titled a "MAYBE become the decision maker, an evaluation
DESIGN." I laid out cost estimates for an would have been done, but it would have
all-encompassing evaluation that included been my evaluation, not his. I'm convinced
formative, summative, and knowledge- he would have waffled over using the find-
generating components looking at all as- ings as he waffled over deciding what kind
pects of the program. Putting dollars and of evaluation to do.
time lines to the choices expedited the de- Thus, in utilization-focused evaluation,
cision making considerably. He decided the choice of not dining at all is always on
not to undertake any evaluation "at this the menu. It's better to find out before
time." preparing the meal that those invited to the
I was relieved. I had become skeptical banquet are not really hungry. Take your
about the potential for doing anything use- feast elsewhere, where it will be savored.
Intended Process Uses
Impacts of Evaluation
Thinking and Experiences

U tility is in the eye of the user.


In the past, the search for use has often been conducted like the search for contraband
in the famous Sufi story about Nasrudin 1 the smuggler.

Nasrudin used to take his donkey across a frontier every day with the panniers loaded
with straw. Since he admitted to being a smuggler, when he trudged home every night,
the frontier guards searched him carefully. They searched his person, sifted the straw,
steeped it in water, even burned it from time to time. Meanwhile, he was becoming
visibly more and more prosperous.
Eventually, he retired to another country, very wealthy. Years later one of the customs
officials encountered him there. "You can tell me now, Nasrudin," he said. "Whatever
was it that you were smuggling, that we could never catch you at?"
"Donkeys," replied Nasrudin grinning.
—Adapted from Shah 1964:59


Process as Outcome How do I know this? Because that's

often what intended users tell me when I
In this chapter, we'll consider ways in follow up the evaluations I conduct to
which being engaged in the processes of evaluate use. Months after an evaluation,
evaluation can be useful quite apart from I'll talk with clients (intended users) to get
the findings that may emerge from those their assessments of whether the evaluation
processes. Reasoning processes are evalu- achieved its intended uses and to find out
ation's donkeys; they carry the load. Rea- what other impacts may have resulted.
soning like an evaluator and operating They often say some version of the follow-
according to evaluation's values have im- ing, a response from an experienced and
pacts. When I refer to process use, then, I wise program director:
mean using the logic, employing the rea-
soning, and being guided by the values that
undergird the profession (Fournier 1995; We used the findings to make some changes
Whitmore 1990; House 1980). Exhibit 5.1 in our intake process and improvements in
provides examples of evaluation logic and the treatment program. We reorganized
values. parts of the program and connected them
Those of us trained in the methods of together better. But you know, the big change
research and evaluation can easily take is in our staff's attitude. They're paying more
for granted the logic that undergirds attention to participant reactions on a daily
those methods. Like people living daily basis. Our staff meetings are more outcomes
inside any culture, the way of thinking of oriented and reflective. Staff exchanges
our culture—the research culture—seems about results are more specific and data
natural and easy. However, to practition- based. We're more focused. And the fear of
ers, decision makers, and policymakers, evaluation is gone. Doing the evaluation had
our logic can be hard to grasp and quite a profound impact on our program culture.
unnatural. I'm talking about what appear It really did.
to be very simple, even simplistic, notions
that have profound effects on how one Any evaluation can, and often does,
views the world. Thinking in terms of have these kinds of effects. What's differ-
what's clear, specific, concrete, and observ- ent about utilization-focused evaluation is
able does not come easily to people who that the process of actively involving in-
thrive on, even depend on, vagueness, gen- tended users increases these kinds of
eralities, and untested beliefs as the basis evaluation impacts. Furthermore, the pos-
for action. They're in the majority. Practi- sibility and desirability of learning from
tioners of evaluation logic are a small mi- evaluation processes as well as findings
nority. The good news is that our way of can be made intentional and purposeful.
thinking, once experienced, is often greatly In other words, instead of treating process
valued. That's what creates demand for our use as an informal offshoot, explicit and
services. Learning to see the world as an up-front attention to the potential im-
evaluator sees it often has a lasting impact pacts of evaluation logic and processes can
on those who participate in an evalu- increase those impacts and make them a
ation—an impact that can be greater and planned purpose for undertaking the
last longer than the findings that result evaluation. In that way the evaluation's
from that same evaluation. overall utility is increased.
Intended Process Uses • 89

Examples of the Logic and Values of Evaluation
That Have Impact on and Are Useful to Participants
Who Experience Evaluation Processes

The logic and values of evaluation derive from research methods and communications. These
admonitions constitute a "logic" in the sense that they represent a particular mode of reasoning
viewed as valid within the culture of evaluation. They are values in the sense that they are what
evaluators generally believe. The guidelines and principles below are meant to be illustrative
rather than exhaustive of all possibilities.

Be clear Be clear about goals and purposes; about what's being evalu-
ated, what data will be collected, what judgments are to be made,
how results will be used—indeed, be clear about everything.
Be specific A favorite evaluation clarifying question: "What exactly do you
mean by that?"
Focus and prioritize You can't do or look at everything. Be intentional and purpose-
ful in deciding what's worth doing and knowing.
Be systematic Plan your work; work your plan. Carefully document what
occurs at every stage of decision making and data collection.
Make assumptions explicit Determine what can and cannot be subjected to empirical test.
Operationalize program The fundamental evaluation challenge is determining how
concepts, ideas, and goals to measure and observe, what is important. Reality testing
becomes real at this point.
Distinguish inputs and Confusing processes with outcomes is common.
processes from outcomes
Have data to provide This means a commitment to reality testing in which logic and
empirical support for evidence are valued over strength of belief and intensity of
conclusions emotions.
Separate data-based state- Interpretations go beyond the data and must be understood
ments of fact from interpre- as what they are: interpretations. Judgments involve values,
tations and judgments determining what is desirable or undesirable.
Make criteria and standards The logical mandates to be clear and specific apply to making
for judgments explicit criteria and standards explicit.
Limit generalizations and Overgeneralizations and overly definitive attributions of
causal explanations to what causality are epidemic outside the culture of research and
data support evaluation.
Distinguish deductive from Both are valued but involve different reasoning sequences.
inductive processes

Process Use Defined streams and even rarer waterfalls offer a

stark contrast to the ancient, parched rock.
Process use refers to and is indicated by Each place offers different content for re-
individual changes in thinking and behavior, flection. The substantive insights one re-
and program or organizational changes in ceives may well vary by place, time, and
procedures and culture, that occur among circumstance. But quite beyond those vari-
those involved in evaluation as a result of the ations is the impact that comes from the
learning that occurs during the evaluation very act of reflection—regardless of con-
process. Evidence of process use is repre- tent and place. The impacts of reflection
sented by the following kind of statement and meditation on one's inner sense of self
after an evaluation: "The impact on our are, for me, analogous to the impacts of
program came not just from the findings but engaging in the processes of evaluation,
from going through the thinking process that quite apart from the content of the evalu-
the evaluation required." ation's findings. In this same sense, for
certain developmental purposes—staff de-
velopment, program development, organi-
zation development—it doesn't matter so
An Analogy
much what the focus of an evaluation is, or
what its findings, some impact will come
Before looking in detail at how evalu-
from engaging thoughtfully and seriously
ation processes can affect users, let me
in the process.
suggest an analogy to clarify the distinction
betweenj3rpc£ss_ use versus findings uise. I
hike the Grand Canyon annually. During
the days there, my body hardens and my A Menu: Uses of Evaluation
thoughts soften. I emerge more mellow, Logic and Processes
peaceful, and centered. It doesn't matter
which part of the Canyon I hike: the South In working with intended users, it's im-
Rim or North; whether I descend all the portant to help them think about the po-
way to the Colorado River or stay on the tential and desired impacts of how the
Tonto to explore a side canyon; whether I evaluation will be conducted. Questions
push strenuously to cover as much territory about who will be involved take on a dif-
as possible or plan a leisurely journey; ferent degree of importance when consid-
whether I ascend some interior monument ering that those most directly involved will
like Mount Huethawali or traverse the Su- not only play a critical role in determining
pai platform that runs the length of the the content of the evaluation, and therefore
Canyon—I return different from when I the focus of findings, but they also will be
entered. Not always different in the same the people most affected by exposure to
way. But different. evaluation logic and processes. The degree
Let me suggest that the specifics of place of internal involvement, engagement, and
are like the findings of an evaluation re- ownership will affect the nature and de-
port. The different places provide different gree of impact on the program's culture.
content. From the rim, one can view mag- How funders and users of evaluation think
nificent vistas. Deep within a side canyon, about and calculate the costs and benefits
one can see little and feel completely alone. of evaluation also are affected. The cost-
Much of the Canyon is desert, but rare benefit ratio changes on both sides of the
Intended Process Uses • 91

equation when the evaluation produces not tions I pose as an evaluator (e.g., What
only findings but also serves immediate specific results are you committed to
programmatic needs such as staff develop- achieving and how would you know if you
ment or participant empowerment. accomplished them?) are different from
I differentiate four primary uses of what they are asked by non-evaluators. It's
evaluation logic and processes: (1) enhanc- not so much that other facilitators don't ask
ing shared understandings, especially about these questions, but they don't ask them
results; (2) supporting and reinforcing the with the same seriousness and pursue the
program through intervention-oriented answers with the same rigor and intensity.
evaluation; (3) increasing participants' en- The very process of formulating a mission
gagement, sense of ownership, and self- and goals so they can be evaluated will
determination (participatory and empow- usually have an impact, long before data are
erment evaluation); and (4) program or actually collected to measure effectiveness.
organizational development. I'll discuss A parallel use of evaluation is to increase
each of these, with examples, then consider shared understandings between program
the controversies engendered by using managers and line staff. Following the ad-
evaluation in these ways. monition that "what gets measured gets
done," managers can work with staff under
the guidance of an evaluator to establish a
Using Evaluation to Enhance monitoring system to help everyone in-
Shared Understandings volved stay focused on desired outcomes.
While the data from such a system may
Evaluation both depends on and facili- ultimately support decision making, in the
tates clear communications. Shared under- short run, the impact is to focus staff atten-
standings emerge as evaluation logic pushes tion and energy on priority outcomes. The
the senders of messages to be as specific as process needs to be facilitated in such a way
possible and challenges listeners to reflect that staff can speak openly about whether
on and feed back to senders what they think board and administrative expectations are
they've heard. Shared understandings are meaningful, realistic, and attainable. In
especially important with regard to ex- other words, done properly, evaluation fa-
pected results. For example, board mem- cilitates shared commitments to results
bers and program staff often have different from top to bottom and bottom to top for
notions of what an agency or program is "improved communication between staff
supposed to accomplish. The processes of at different levels of program implementa-
clarifying desired ends and focusing staff tion" (Aubel 1993:13).
efforts on accomplishing those ends by You may have experienced both the
evaluating actual accomplishments ought presence and absence of evaluation logic in
to be primary board functions, but few your education. When a teacher announces
boards fulfill these functions effectively a test and says, "Here's what will be on the
(Carver 1990). test and here's what I'll be looking for,"
I'm often asked to facilitate board or that teacher is manifesting the evaluation
staff retreats to help them learn and apply principle that what gets measured gets
the logic and discipline of evaluation to done. Making criteria explicit and commu-
formulating the organization's mission and nicating them to all concerned is equitable
goals. The feedback I get is that the ques- and fair. In contrast, I've observed teachers

refuse to tell their class what will be on a dation wanted to cast the net wide, so it
test, then later, in individual, informal con- issued a general invitation:
versations, they reveal the test's focus to
persistent and inquiring students. Telling We seek grant proposals that will enhance
everyone would have been more fair. the health of specific ecosystems.
The logic and principles of evaluation
also can be useful in negotiations between The responses varied greatly with many
parties with different perspectives. For ex- completely missing the mark in the opinion
ample, a major foundation was interested of the foundation staff. But what was the
in funding an effort to make schools more mark? A great deal of time and effort was
racially equitable. The school district ex- wasted by hopeful proposal writers who
pressed great interest in such funding but didn't know what criteria to address, and
resisted committing to explicit school staff spent a lot of time sifting through
changes that might undermine building- proposals that had no hope of being
level autonomy or intrude into personnel funded. The process created frustration on
evaluations of principals. Over a period of both sides. After a planning session focused
several months, the funder and school offi- on specifying desired results and explicit
cials negotiated the project. The negotia- evaluation criteria, the second announce-
tions centered on expected evaluation out- ment was quite a bit more focused:
comes. The funder and school district
eventually agreed to focus the project and
We seek grant proposals that will enhance
evaluation on community-based, school-
the health of specific ecosystems. Proposals
specific action plans, activities, and changes
will be judged on the following criteria:
rather than a standardized and prescribed
set of district-determined mandates. Case • clarity and meaningfulness of ecosystem
studies were chosen as the most appropri- definition
ate evaluation method, rather than stan- • private-public sector cooperation
dardized instruments for measuring school • action orientation and likelihood of
climate. The design of the entire project demonstrable impact
was changed and made more focused as a • incorporation of a prevention orientation
result of these negotiations. Applying the B regional coordination
logic of evaluation had a major impact on
the project's design without any data col- This set of criteria eliminates basic re-
lection, findings, or a report. Everyone search proposals, of which a large number
came out of the negotiations clear about were received from universities in the first
what was to happen in the project and how round, and makes it clear that those seek-
it would be evaluated. ing grants must submit as cooperative
groups rather than as single individuals or
Inadequate specification of desired re- entities, also characteristic of a large num-
sults reduces the likelihood of attaining ber of initial proposals. Subsequent an-
those results. Consider how adding a re- nouncements became even more specific
sults orientation changed the Request for when focused on specific action priorities,
Proposals announcement of a major envi- such as pollution prevention. The staff,
ronment-oriented philanthropic founda- with training and facilitation, learned to
tion. In the initial announcement, the foun- use evaluation logic to articulate desired
Intended Process Uses • 93

results, enhance communications, and in- to a different and more controversial use
crease responsiveness. of evaluation processes: intervention-ori-
A different use of evaluation to enhance ented evaluation.
mutual understanding involves designing
the evaluation to "give voice" to the disen-
franchised, underprivileged, poor, and oth-
ers outside the mainstream (Weiss and Evaluation as an Integral
Greene 1992:145). In the evaluation of a Programmatic Intervention
diversity project in the Saint Paul Schools,
a major part of the design included cap- Textbooks on measurement warn that
turing and reporting the experiences of measuring the effects of a treatment (e.g.,
people of color. Providing a way for Afri- a social program) should be independent of
can American, Native American, Chicano- and separate from the treatment itself. For
Latino, and Hmong parents to tell their example, participants who take a pretest
stories to mostly white, corporate funders may perform better in the program than
was an intentional part of the design, one those who do not take the pretest because
approved by those same white corporate the pretest increases awareness, stimulates
funders. Rather than reaching singular con- learning, and/or enhances preparation for
clusions, the final report was a multivocal, program activities. To account for such test
multicultural presentation of different ex- effects, evaluation researchers in the past
periences with and perceptions of the pro- have been advised to use experimental de-
gram's impacts. The medium of the report signs that permit analysis of differences in
carried the message that multiple voices performance for those who took the pretest
needed to be heard and valued as a mani- compared to a group that did not take the
festation of diversity (Stockdill et al., pretest. Integrating data collection into
1992). The findings were used for both program implementation would be consid-
formative and summative purposes, but the ered a problem—a form of treatment
parents and many of the staff were most contamination—under traditional rules of
interested in using the evaluation processes research.
to make themselves heard by those in Departing from defining evaluation as
power. Being heard was an end in itself, rigorous application of social science meth-
quite separate from use of findings. ods opens a different direction in evalu-
Wadsworth (1995) has reported that ation (Patton 1988), one that supports in-
evaluation processes can facilitate interac- tegration of evaluation into program
tions between service providers and service processes. Making data collection integral
users in a way that leads to "connected- rather than separate can reinforce and
ness" and "dialogue across difference" (p. 9). strengthen the program intervention. Such
Each learns to see the service through the an approach also can be cost-effective and
others' eyes. In the process, what began as efficient since, when evaluation becomes
opposing groups with opposing truths is integral to the program, its costs aren't an
transformed into "an affinity-based com- add-on. This enhances the sustainability of
munity of inquiry" with shared truths. evaluation because, when it's built in rather
Using evaluation to enhance shared than added on, it's not viewed as a tempo-
understandings is a relatively traditional rary effort or luxury that can be easily
use of evaluation logic. Let's turn now dispensed with when cuts are necessary.

To illustrate this approach, consider the that is, a design that makes the data collec-
case of a one-day workshop. A traditional tion part of the workshop rather than sepa-
evaluation design, based on standard social rate from and independent of the work-
science standards of rigor, would typically shop. In this scenario, the workshop begins
include a pretest and posttest to assess as follows:
changes in participants' knowledge, skills,
and attitudes. As the workshop opens, par-
The first part of the workshop involves your
ticipants are told,
completing a self-assessment of your knowl-
edge, skills, and attitudes. This will help you
Before we begin the actual training, we want prepare for and get into thinking about the
you to take a pretest. This will provide a things we will be covering today in your
baseline for our evaluation so we can find out training.
how much you already know and then mea-
sure how much you've learned when you T h e w o r k s h o p then proceeds. At the
take the posttest. end of the day, the w o r k s h o p presenter
closes as follows:
At the end of the day, participants are ad-
ministered the same instrument as a post- Now the final workshop activity is for you
test. They are told, to assess what you have learned today. To
that end, we are going to have you retake the
Now the workshop is over, but before you self-assessment you took this morning. This
leave, we need to have you take the posttest will serve as a review of today and let you see
to complete the evaluation and find out how how much you've learned.
much you have benefited from the training.
In this second scenario, the w o r d evalu-
T h e desired design for high internal ation is never mentioned. T h e pre- and
validity w o u l d include, in addition to the post-assessments are explicitly and inten-
pre-post treatment g r o u p , (1) a control tionally part of the w o r k s h o p in accor-
group that takes the pre- and posttests dance with adult learning principles
w i t h o u t experiencing the w o r k s h o p , (2) a (Brookfield 1 9 9 0 ; Knox 1 9 8 7 ; Schon
control group that gets the posttest only, 1987; Knowles et al. 1985). W e k n o w , for
and (3) a treatment group that gets the example, that when participants are told
posttest only. All groups, of course, what they will learn, they become pre-
should be r a n d o m l y selected and assigned, pared for the learning; learning is further
and the administration of the test should enhanced when it is reinforced both im-
be standardized and take place at the same mediately and over the long term. In the
time. Such a design w o u l d permit mea- second scenario, the self-assessment in-
surement of and control for instrumen- strument serves both the function of pre-
tation effects. paring people for learning and as baseline
Let me n o w pose a contrary example of data. T h e posttest serves the dual func-
h o w the evaluation might be handled, a tions of learning reinforcement and evalu-
design that fully integrates the evaluation ation. Likewise, a six-month follow-up t o
data collection into the program delivery, assess retention can serve the dual func-
Intended Process Uses • 95

tions of learning reinforcement and longi- fully interjects data collection in ways that
tudinal evaluation. enhance achievement of program out-
The methodological specialist will note comes, while also meeting evaluation infor-
that the second scenario is fraught with mation needs. We followed this principle in
threats to validity. However, the purpose of evaluating a wilderness program that
data collection in this second scenario is not aimed to turn college administrators into
only assessment of the extent to which leaders in experiential education. Partici-
change has occurred, but increasing the pants hiked 10 days in the Gila Wilderness
likelihood that change will occur. It does of New Mexico in the fall, climbed the
not matter to these particular intended us- Kofa Mountains of Arizona in the winter,
ers (the workshop instructors) how much of and rafted the San Juan River in Utah in the
the measured change is due to pretest sen- spring. During these trips, participants kept
sitization versus actual learning activities, journals for reflection. The program's phi-
or both, as long as the instrument items are losophy was, "One doesn't just learn from
valid indicators of desired outcomes. experience; one learns from reflection on
Moreover, in the second scenario, the data experience." The process of journaling was
collection is so well integrated into the part of the program intervention, but also
program that there are no separate evalu- a prime source of qualitative evaluation
ation costs except for the data analysis data capturing how participants reacted to
itself. Under the second scenario, the ad- and were changed by project participation.
ministration of the pretest and posttest is a In addition, participants were paired to-
part of the program such that even if the gether to interview each other before, dur-
data were not analyzed for evaluation pur- ing, and after each wilderness experience.
poses, the data collection would still take
These interviews were part of the project's
place, making evaluation data collection
reflection process, but also a source of case
highly cost-effective.
data for evaluation. The evaluation process
thus became part of the intervention in
providing participants with experiences in
Principles of Intervention- reflective practice (Schon 1987, 1983). In-
Oriented Evaluation deed, it was on this project that I first
learned how profoundly in-depth inter-
I have called this process intervention- views can affect people. Such personal,
oriented evaluation to make explicit the intensive, and reflective data collection is
direct and integral connection between an intervention. In intervention-oriented
data collection and program results. A pro- evaluation, such data collection is designed
gram is an intervention in the sense that it to reinforce and strengthen the program's
is aimed at changing something. The evalu- impact.
ation becomes part of the programmatic Another, quite different, example comes
intervention to the extent that the way it is from an intervention-designed evaluation
conducted supports and reinforces accom- of an international development effort
plishing desired program goals. called the Caribbean Agricultural Exten-
The primary principle of intervention- sion Project, funded by the U.S. Agency
oriented evaluation is to build a program for International Development (U.S. AID).
delivery model that logically and meaning- The project aimed to improve national

agricultural extension services in eight Ca- ommendations were solidly grounded in

ribbean countries. The project began with knowledge of the farm and household situ-
a rapid reconnaissance survey to identify ation, including labor availability, land
the farming systems in each participating availability, income goals, and past agricul-
island. This involved an interdisciplinary tural experiences. These data were neces-
team of agricultural researchers, social sci- sary for the extension agent to do a good
entists, and extension staff doing fieldwork job of advising farm families about increas-
and interviewing farmers for a period of 10 ing their productivity.
days to identify extension priorities for a These same data were the baseline for
specific agro-ecological zone. This process measuring the program's impact on indi-
served as the basis for needs assessment and vidual farmers for evaluation purposes.
program development. It was also, quite The collection of such data for farm man-
explicitly and intentionally, an intervention agement purposes required training of
in and of itself in that the process garnered agents, and a great deal of time and effort.
attention from both farmers and agricul- It would have been enormously expensive
tural officials, thereby beginning the exten- to collect such data independently, solely
sion mobilization process. In addition, the for purposes of evaluation. However, by
rapid reconnaissance survey served the establishing a record-keeping system for
critical evaluation function of establishing individual farmers that served a primary
baseline data. Subsequent data on the ef- extension purpose, the project also estab-
fects of extension and agricultural develop- lished a record-keeping system for evalu-
ment in the zone were compared against ation purposes. By aggregating the data
this baseline for evaluation purposes. Yet, it from individual households, it was possible
would have been much too expensive to to analyze system-level impact over time.
undertake this kind of intensive team field- The data aggregation and comparative
work simply for purposes of evaluation. analysis were above and beyond the main
Such data collection was practical and cost- program purpose of collecting the data..
effective because it was fully integrated into However, without that program purpose;;
other critical program processes. the data would have been much too expen-
Once the various farming systems were sive to collect solely for evaluation of the
identified and the needs of farmers had system.
been specified within those systems, the The program staff also used the evalu-
extension staff began working with individ- ation design formulated by the external
ual farmers to assess their specific produc- evaluators as the framework for their plan
tion goals. This process included gathering of work, which set the agenda for monthly
data about the farmer's agricultural enter- staff meetings and quarterly staff reports
prises and household income flows. With (an example of using evaluation to enhance
these data in hand, extension agents and focus communications). As such, the
worked with farmers to set realistic goals evaluation priorities were kept before the
for change and to help farmers monitor the staff at all times. As a result, the evaluation
effects of recommended interventions. The process improved program implementa-
program purpose of using this approach, tion from the very beginning by focusing
called a farm management approach, was staff implementation efforts.
to individualize the work of extension Still another powerful example of inter-
agents with farmers so that the agent's rec- vention-oriented evaluation comes from
Intended Process Uses • 97

the Hazelden Foundation, a chemical de- are considerations to discuss with intended
pendency treatment program in Minne- users and evaluation funders in deciding
sota. Part of the program intervention in- the relative priority of different potential
cludes helping clients and their significant uses of evaluation and in reviewing the
others identify their chemical abuse pat- principles of intervention-oriented evalu-
terns. A self-assessment instrument serves ation (Exhibit 5.2). Now, let's examine the
this purpose while also providing baseline use of evaluation processes to engage par-
data on chemical use. After residency treat- ticipants more fully.
ment, all clients and significant others re-
ceive follow-up surveys at six months, one
year, and two years. The follow-up surveys Supporting Engagement,
provide outcomes data on program effec- Self-Determination, and Ownership:
tiveness, but they also remind clients and Participatory, Collaborative, and
their significant others to assess their cur- Empowerment Evaluation
rent chemical use behaviors. Clients who
have relapsed into abusive behaviors are Early in my career, I was commissioned
invited to contact Hazelden for support, by a Provincial Deputy Minister in Canada
assessment, and possible reentry into treat- to undertake an evaluation in a school di-
ment. Thus, the follow-up survey is a vision he considered mediocre. I asked
mechanism for reinforcing treatment and what he wanted the evaluation to focus on.
extending an offer of new help. Many cli- "I don't care what the focus is," he
ents respond to this contact and seek addi- replied. "I just want to get people engaged
tional help. For that reason, the survey is in some way. Education has no life there.
sent to all former clients, not just the small Parents aren't involved. Teachers are just
random sample that would be sufficient if putting in time. Administrators aren't lead-
the survey provided only evaluation data. ing. Kids are bored. I'm hoping evaluation
In my experience, program funders, can stir things up and get people involved
managers, and staff can become very ex- again." That's how the evaluation of the
cited about the creative possibilities for Frontier School Division, described in
integrating evaluation into a program in Chapter 2, began.
such a way that it supports and reinforces The processes of participation and col-
the program intervention. Not only does laboration have an impact on participants
this make the evaluation process more use- and collaborators quite beyond whatever
ful, it often makes the evaluation findings they may accomplish by working together.
more relevant, meaningful, accessible, and In the process of participating in an evalu-
useful. Yet, this approach can be controver- ation, participants are exposed to and have
sial because the evaluation's credibility may the opportunity to learn the logic of evalu-
be undercut by concerns about whether the ation and the discipline of evaluation rea-
data are sufficiently independent of the soning. Skills are acquired in problem iden-
treatment to be meaningful and trustwor- tification, criteria specification, and data
thy; the evaluator's independence may be collection, analysis, and interpretation. Ac-
suspect when the relations with staff and/or quisition of evaluation skills and ways of
participants become quite close; and the thinking can have a longer-term impact
capacity to render an independent, summa- than the use of findings from a particular
tive judgment may be diminished. These evaluation study.

E X H I B I T 5.2
Principles of Intervention-Oriented Evaluation

• The evaluation is designed to support, reinforce, and enhance attainment of desired program
• Evaluation data collection and use are integrated into program delivery and management.
Rather than being separate from and independent of program processes, the evaluation is an
integral part of those processes.
• Program staff and participants know what is being evaluated and know the criteria for judging
• Feedback of evaluation findings is used to increase individual participant goal attainment as
well as overall program goal attainment.
• There are no or only incidental add-on costs for data collection because data collection is part
of program design, delivery, and implementation.
• Evaluation data collection, feedback, and use are part of the program model, that is, evaluation
is a component of the intervention.

Moreover, people who participate in the evaluator becomes a facilitator, collabo-

creating something tend to feel more own- rator, and teacher in support of program
ership of what they have created, make participants and staff engaging in their own
more use of it, and take better care of it. evaluation. While the findings from such a
Active participants in evaluation, therefore, participatory process may be useful, the
are more likely to feel ownership not only more immediate impact is to use'the evalu-
of their evaluation findings, but also of the ation process to increase participants' sense
evaluation process itself. Properly, sensi- of being in control of, deliberative about,
tively, and authentically done, it becomes and reflective on their own lives and situ-
their process. ations.
Participants and collaborators can be The labels participatory evaluation and
staff and/or program participants (e.g., cli- collaborative evaluation mean different
ents, students, community members, etc.). things to different evaluators. Some use
Sometimes administrators, funders, and these phrases interchangeably or as mutu-
others also participate, but the usual con- ally reinforcing concepts (e.g., Dugan
notation is that the primary participants are 1996; Powell, Jeffries, and Selby 1989;
"lower down" in the hierarchy. Participa- Whitmore and Kerans 1988). Wadsworth
tory evaluation is bottom up. (1993b) distinguishes "research on people,
In 1995, evaluators interested in "Col- for people, or with people" (p. 1). Whit-
laborative, Participatory, and Empower- more (1988) has defined the participatory
ment Evaluation" formed a Topical Interest approach as combining "social investi-
Group within the American Evaluation As- gation, education, and action with the
sociation. What these approaches have in ultimate purpose of engendering broad
common is a style of evaluation in which community and social change" (p. 3).
Intended Process Uses • 99

Whkmore worked with a community- Here is clear support for the central
based team and contended that, through premise of this chapter: The process of
the evaluation process, participants not engaging in evaluation can have as much
only gained new knowledge and skills but or more impact than the findings gener-
also created a support network among ated. It was not a group's specific ques-
themselves and gained a greater sense of tions or answers that Uphoff found most
self-efficacy. affected the groups he observed. It was the
In the mid-1980s, several international process of reaching consensus about ques-
grassroots development organizations ad- tions and engaging with each other in the
vocated participatory evaluation as a tool meaning of the answers turned up. The
for community and local leadership devel- process of participatory self-evaluation, in
opment, not only as a management tool and of itself, provided useful learning ex-
(PACT 1986). In advocating for participa- periences for participants.
tory evaluation, the Evaluation Sourcebook Since no definitive definitions exist for
of the American Council of Voluntary participatory and collaborative evaluation,
Agencies for Foreign Service (ACVAFS these phrases must be defined and given
1983) asserted, "Participation is what de- meaning in each setting where they're used.
velopment is about: gaining skills for self- Exhibit 5.3 presents what I consider the
reliance" (p. 12). Thus, in developing primary principles of participatory evalu-
countries, participatory evaluation has ation. This list can be a starting point for
been linked to community development working with intended participants to de-
and empowerment; industrialized coun- cide what principles they want to adopt for
tries, where notions of "value-free" social their own process.
science have long been dominant, have Cousins and Earl (1995, 1992) have ad-
come to this idea of linking evaluation vocated participatory and collaborative ap-
participation with empowerment more proaches primarily to increase use of find-
slowly, and, as we shall see later, the notion ings: "Unlike emancipatory forms of action
remains controversial. research, the rationale for participatory
Norman Uphoff (1991) has published A evaluation resides not in its ability to ensure
Field Guide for Participatory Self-Evalu- social justice or to somehow even the socie-
ation, aimed at grassroots community de- tal playing field but in the utilization of
velopment projects. After reviewing a num- systematically collected and socially con-
ber of such efforts, he concluded, structed knowledge" (p. 10). Yet, the
authors go beyond increased use of findings
when they discuss how participation helps
If the process of self-evaluation is carried out create a learning organization. Viewing
regularly and openly, with all group mem- participatory evaluation as a means of cre-
bers participating, the answers they arrive at ating an organizational culture committed
are in themselves not so important as what is to ongoing learning has become an impor-
learned from the discussion and from the tant theme in recent literature linking
process of reaching consensus on what ques- evaluation to learning organizations (e.g.,
tions should be used to evaluate group King 1995; Aubel 1993; Leeuw, Rist, and
performance and capacity, and on what an- Sonnichsen 1993; Sonnichsen 1993) "The
swers best describe their group's present goal of a participatory evaluator is eventu-
status, (p. 272) ally to put him or herself out of work when

Principles of Participatory Evaluation

The evaluation process involves participants in learning evaluation logic and skills, for
example, goal setting, establishing priorities, focusing questions, interpreting data, data-
based decision making, and connecting processes to outcomes.
Participants in the process own the evaluation. They make the major focus and design
decisions. They draw and apply conclusions. Participation is real, not token.
Participants focus the evaluation on process and outcomes they consider important and to
which they are committed.
Participants work together as a group and the evaluation facilitator supports group cohesion
and collective inquiry.
All aspects of the evaluation, including the data, are understandable and meaningful to
Internal, self-accountability is highly valued. The evaluation, therefore, supports participants'
accountability to themselves and their community first, and exte/nal accountability secondarily,
if at all.
The evaluator is a facilitator, collaborator, and learning resource; participants are decision
makers and evaluators. ^
The evaluation facilitator recognizes and values participants' perspectives and expertise and
works to help participants recognize and value their own and each other's expertise.
Status differences between the evaluation facilitator and participants are minimized.

the research capacity of the organization is and building a culture of learning in a

self-sustaining" (King 1995:89). Indeed, program or organization. Making this kind
the self-evaluating organization (Wildavsky of process use explicit enlarges the menu of
1985) constitutes an important direction in potential evaluation uses. How important
the institutionalization of evaluation logic this use of evaluation should be in any
and processes. given evaluation is a matter for negotiation
Utilization-focused evaluation is inher- with intended users. The practical implica-
ently participatory and collaborative in ac- tion of an explicit emphasis on creating a
tively involving primary intended users in learning culture as part of the process will
all aspects of the evaluation. Evidence pre- mean building into the evaluation atten-
sented in earlier chapters has demonstrated tion to and training in evaluation logic and
the effectiveness of this strategy for increas- skills.
ing use of findings. The added emphasis of Not all references to participatory or
this chapter is how participation and col- collaborative evaluation make the link to
laboration can lead to an ongoing, longer- participant learning. Levin (1993) distin-
term commitment to using evaluation logic guished three purposes for collaborative
Intended Process Uses • 101

research: (1) the pragmatic purpose of in- dersman 1996). In so doing, community
creasing use, (2) the philosophical or meth- capacity can also be enhanced as a group
odological purpose of grounding data in realizes and builds on its assets (Mayer
practitioner's perspectives, and (3) the po- 1996, n.d.).
litical purpose of mobilizing for social ac- Empowerment evaluation is most ap-
tion. A fourth purpose, identified here, is propriate where the goals of the program
teaching evaluation logic and skills. In the include helping participants become more
next section, we'll examine in greater self-sufficient and personally effective. In
depth the political uses of evaluation to such instances, empowerment evaluation is
mobilize for social action and support so- also intervention oriented in that the evalu-
cial justice. ation is designed and implemented to sup-
port and enhance the program's desired
outcomes. Weiss and Greene (1992) have
Empowerment Evaluation shown how empowerment partnerships be-
tween evaluators tand program staff were
The theme of the 1993 American Evalu- particularly appropriate in the family sup-
ation Association national conference was port movement, because that movement
"Empowerment Evaluation." David Fetter- emphasized participant and community
man (1993), AEA President that year, de- empowerment.
fined empowerment evaluation as "the use I facilitated a cluster team evaluation of
of evaluation concepts and techniques to 34 programs serving families in poverty
foster self-determination. The focus is on (Patton et al. 1993). A common and impor-
helping people help themselves" (p. 115). tant outcome of those programs was in-
creased intentionality—having participants
Self-determination, defined as the ability to end up with a plan, a sense of direction, an
chart one's own course in life, forms the assumption of responsibility for their lives,
theoretical foundation of empowerment and a commitment to making progress.
evaluation. It consists of numerous intercon- Increased intentionality began with small
nected capabilities that logically follow each first steps. Families in poverty often feel
other . . . : the ability to identify and express stuck where they are or are experiencing a
needs, establish goals or expectations and a downward spiral of worsening conditions
plan of action to achieve them, identify re- and ever greater hopelessness. These pro-
sources, make rational choices from various grams commonly reported that it was a
alternative courses of action, take appropri- major achievement to give people a sense
ate steps to pursue objectives, evaluate short- of hope manifest in a concrete plan that
and long-term results (including reassessing participants had developed, understood,
plans and expectations and taking necessary and believed they could accomplish. In-
detours), and persist in pursuit of those creased intentionality is a commitment to
goals. (Fetterman 1994a:2) change for the better and a belief that such
a change is possible. Thus, the programs
These skills are used to realize the group's collectively placed a great deal of emphasis
own political goals; through self-assess- on developing such skills as goal setting,
ment and a grdup's knowledge of itself, it learning to map out strategies for attaining
goals, and monitoring progress in attaining
achieves accountability unto itself as well as
personal goals. The programs' evaluations
to others (Fetterman, Kaftarian, and Wan-

were built around these family plans and many years of formal education to share the
supported them. Developing family plans responsibility to evaluate their own program
was not an end in itself, but the ability and experiences, learn the language of evalu-
willingness to work on a plan emerged as a ation, deal with data, and report results. It's
leading indicator of the likelihood of suc- very empowering.
cess in achieving longer-term outcomes.
Creating and taking ownership of a plan
became milestones of progress. The next Empowerment
milestone was putting the plan into action. and Social Justice
Another empowering outcome of par-
ticipatory evaluation is forming effective The phrase "empowerment evaluation"
groups for collective action and reflection. can bridle. It comes across to some like a
For example, social isolation is a common trendy buzzword. Others experience it as
characteristic of families in poverty. Isola- oxymoronic or disingenuous. Still others
tion breeds a host of other problems, in- find the phrase offensive and conde-
cluding family violence, despair, and alien- scending. Few people, in my experience,
ation. Bringing participants together to react neutrally. Like the strategic planning
establish mutual goals of support and iden- term proactive, the word empowerment
tifying ways of evaluating (reality-testing) can create hostile reactions and may fall on
goal attainment is a process of community hard times.
development. The very process of working Empowerment carries an activist, social
together on an evaluation has an impact on change connotation, as does a related idea,
the group's collective identity and skills in using evaluation for social justice. Vera, the
collaborating and supporting each other. main character in Nadine Gordimer's
Participants also learn to use expert re- (1994) novel, None to Accompany Me, ex-
sources, in this case, the facilitating evalu- claims, after a lengthy exchange about em-
ator, but inquiry is democratized (IQREC powerment of South African Blacks, "Em-
1997). One poverty program director ex- powerment, what is this new thing? What
plained to me the impact of such a process happened to what we used to call justice?"
as she observed it: (p. 285). Perhaps Vera would have been
pleased by the theme chosen by President
It's hard to explain how important it is to get Karen Kirkhart for the American Evalu-
people connected. It doesn't sound like a lot ation Association national conference in
to busy middle-class people who feel their 1994 (the year after Empowerment Evalu-
problem is too many connections to too ation was the theme): "Evaluation and So-
many things. But it's really critical for the cial Justice."
people we work with. They're isolated. They The first prominent evaluation theorist
don't know how the system works. They're to advocate valuing based on principles of
discouraged. They're intimidated by the sys- social justice was Ernest House (1990b,
tem's jargon. They don't know where to 1980). He has consistently voiced concern
begin. It's just so critical that they get con- for democratizing decision making. In that
nected, take action, and start to feel effective. context, he has analyzed the ways in which
I don't know how else to say it. I wish I could evaluation inevitably becomes a political
communicate what a difference it makes for tool in that it affects "who gets what" (dis-
a group of poor people who haven't had tributive justice). Evaluation can enhance
Intended Process Uses • 103

fair and just distribution of benefits and on and facilitate a variety of change pro-
responsibilities, or it can distort such distri- cesses (O'Toole 1995; Kanter, Stein, and
butions and contribute to inequality. In Jick 1992; Fossum 1989; McLean 1982),
rendering judgments on programs, the so- including solving communications prob-
cial justice evaluator is guided by such prin- lems (D'Aprix 1996); conflict resolution
ciples as equality, fairness, and concern for (Kottler 1996); strategic planning (Bryson
the common welfare (Sirotnik 1990). 1995); leadership development (Kouzes
Both social justice and empowerment and Posner 1995; Terry 1993; Bryson and
evaluation change the role of the evaluator Crosby 1992; Schein 1985; Argyris 1976);
from the traditional judge of merit or teamwork (Parker 1996); human resources
worth to a social change agent. Many (Argyris 1974); diversity training (Morri-
evaluators surveyed by Cousins et al. son 1995); shaping organizational culture
(1995) were hostile to or at least ambiva- (Hampden-Turner 1990; Schein 1989); or-
lent about whether participatory evalu- ganizational learning (Aubrey and Cohen
ation can or should help bring about social 1995; Watkins and Marsick 1993; Senge
justice. Certainly, evaluators undertaking 1990; Morgan 1989; Argyris 1982); and
such an approach need to be comfortable defining mission, to name but a few OD
with and committed to it, and such an arenas of action (Handy 1993; Massarik
activist agenda must be explicitly recog- 1990; Morgan 1986; Azumi and Hage
nized by, negotiated with, and formally 1972). Sometimes their methods include
approved by primary intended users. organizational surveys and field observa-
From a utilization-focused perspective, tions, and they may facilitate action re-
the important point is this: Using evalu- search as a basis for problem solving
ation to mobilize for social action, em- (Whyte 1991; Schon 1987; Argyris, Put-
power participants, and support social jus- nam, and Smith 1985; Wadsworth 1984)
tice are options on the menu of evaluation or even evaluation (King 1995; Prideaux
process uses. Since how these options are 1995; Wadsworth 1993a, 1993b; Patton
labeled will affect how they are viewed, 1990:157-62). Program evaluation can be
when discussing these possibilities with pri- viewed as one approach on the extensive
mary intended users, evaluation facilitators menu of organization and program devel-
will need to be sensitive to the language opment approaches. Evaluation's niche is
preferences of those involved. defined by its emphasis on reality testing
Now, we turn to a conceptually different based on systematic data collection for im-
use of evaluation processes, what I'll call provement, judging merit and worth, or
here developmental evaluation. generating knowledge about effective-
ness. The processes of evaluation support
change in organizations by getting people
Program and engaged in reality testing, that is, helping
Organization Development: them think empirically, with attention to
Developmental Evaluation specificity and clarity, and teaching them
the methods and utility of data-based deci-
The profession of program evaluation sion making. Bickman (1994), in an article
has developed parallel to the professions of entitled "An Optimistic View of Evalu-
management consulting and organization ation," predicted that evaluators in the fu-
development (OD). OD consultants advise ture would become more involved in pro-

gram development, especially "front end" 550 grants made by the Northwest Area
assistance as part of a development team. Foundation over five years were congruent
For example, evaluability assessment with its mission. The board used that as-
(Wholey 1994; Smith 1989) has emerged sessment at a retreat to review and then
as a process for evaluators to work with revise the organization's mission. The
program managers to help them get ready process of clarifying the foundation's mis-
for evaluation. It involves clarifying goals, sion with staff and board directors had at
finding out various stakeholders' views of least as much impact as the findings (Hall
important issues, and specifying the model 1992).
or intervention to be assessed. From my Action research (King and Lonnquist
perspective, this is really a fancy term that 1994a, 1994b), evaluability assessment,
gives evaluators a credible niche for doing and mission-oriented evaluation facilitate
program and organizational development. organizational change through the pro-
Time and time again, evaluators are asked cesses staff experience as much as through
to undertake an evaluation only to find that any findings generated. That is also the case
goals are muddled, key stakeholders have for a type of evaluation partnership aimed
vastly different expectations of the pro- explicitly at development: developmental
gram, and the model that the program evaluation.
supposedly represents, that is, its interven-
tion, is vague at best. In other words, the
program has been poorly designed, concep-
tualized, or developed. In order to do an Developmental Evaluation
evaluation, the evaluator has to make up
for these deficiencies. Thus, by default, the I introduced the term developmental
evaluator becomes a program or organiza- evaluation (Patton 1994a) to describe cer-
tional developer. Rog (1985) studied the tain long-term, partnering relationships
use of evaluability assessments and found with clients who are themselves engaged in
that many of them precipitated substantial ongoing program or organizational devel-
program change but did not lead to a for- opment. (See Exhibit 5.4 for a formal defi-
nition of developmental evaluation.) These
mal evaluation. The programs realized
clients incorporate me into their decision-
through the process of evaluability assess-
making process as part of their design
ment that they had a lot more development
teams because they value the logic and
to do before they could or should under-
conceptual rigor of evaluation thought, as
take a formal evaluation, especially a sum-
well as the knowledge I've picked up about
mative evaluation. In such cases, the pro-
effective programming based on accumu-
cesses and logic of evaluation have impact
lated evaluation wisdom. My role is to ask
on program staff quite beyond the use of
evaluative questions and hold their feet to
findings from the assessment.
the fire of reality testing. Evaluation data
Mission-oriented evaluation is an organ- are collected and used as part of this pro-
izational development approach that in- cess, to be sure, but quite above and beyond
volves assessing the extent to which the the use of findings, these development-
various units and activities of the organiza- oriented decision makers want to have
tion are consistent with its mission. For their ideas examined in the glaring light of
example, I evaluated the extent to which evaluation logic.
Intended Process Uses • 105

Developmental Evaluation Defined

Developmental evaluation refers to evaluation processes undertaken for the purpose of

supporting program, project, staff and/or organizational development, including asking
evaluative questions and applying evaluation logic for developmental purposes. The evaluator
is part of a team whose members collaborate to conceptualize, design, and test new approaches
in a long-term, ongoing process of continuous improvement, adaptation, and intentional change.
The evaluator's primary function in the team is to elucidate team discussions with evaluative
questions, data, and logic and to facilitate data-based decision making in the developmental

j Developmentally oriented programs Developmentally oriented leaders in or-

I have as their purpose the sometimes vague, ganizations and programs don't expect (or
I general notion of ongoing development. even want) to reach the state of "stabiliza-
The process is the outcome. They eschew tion" required for summative evaluation.
[ clear, specific, and measurable goals up Staff don't aim for a steady state of pro-
I front because clarity, specificity, and mea- gramming because they're constantly tink-
[ surability are limiting. They've identified ering as participants, conditions, learning,
[ an issue or problem and want to explore and context change. They don't aspire to
i some potential solutions or interventions, arrive at a fixed model that can be general-
\ but they realize that where they end up will ized and disseminated. At most, they may
I be different for different participants—and discover and articulate principles of inter-
I that participants themselves should play a vention and development, but not a repli-
j major role in goal setting. The process cable model that says "do X and you'll get
I often includes elements of participatory Y." Rather, they aspire to continuous prog-
! evaluation, for example, engaging staff and ress, ongoing adaptation, and rapid re-
\ participants in setting personal goals and sponsiveness. No sooner do they articulate
i monitoring goal attainment, but those and clarify some aspect of the process than
' goals aren't fixed—they're milestones for that very awareness becomes an inter-
assessing progress, subject to change as vention and acts to change what they do.
learning occurs—so the primary purpose is They don't value traditional characteristics
i program and organizational development of summative excellence, such as stan-
rather than individual or group empower- dardization of inputs, consistency of treat-
ment. As the evaluation unfolds, program ment, uniformity of outcomes, and clarity
; designers observe where they end up and of causal linkages. They assume a world of
make adjustments based on dialogue about multiple causes, diversity of outcomes, in-
what's possible and what's desirable, consistency of interventions, interactive ef-
though the criteria for what's "desirable" fects at every level—and they find such a
may be quite situational and always subject world exciting and desirable. They never
to change. expect to conduct a summative evaluation

because they don't expect the program—or Development-focused relationships can

world—to hold still long enough for sum- go on for years and, in many cases, never
mative review. They expect to be forever involve formal, written reports.
developing and changing—and they want The evaluator becomes part of the pro-
an evaluation approach that supports de- gram design team or an organization's
velopment and change. management team, not apart from the
Moreover, they don't conceive of devel- team or just reporting to the team, but
opment and change as necessarily improve- fully participating in decisions and facili-
ments. In addition to the connotation that tating discussion about how to evaluate
formative evaluation is ultimately meant to whatever happens. All team members, to-
lead to summative evaluation (Scriven gether, interpret evaluation findings, ana-
1991a), formative evaluation carries a bias lyze implications, and apply results to the
about making something better rather than next stage of development. The purpose
just making it different. From a develop- of the evaluation is to help develop the
mental perspective, you do something dif- intervention; the evaluator is committed
ferent because something has changed— to improving the intervention and uses
your understanding, the characteristics of evaluative approaches to facilitate ongoing
participants, technology, or the world. development.
Those changes are dictated by your current
perceptions, but the commitment to
change doesn't carry a judgment that what Five Examples of
was done before was inadequate or less Developmental Evaluation
effective. Change is not necessarily pro-
gress. Change is adaptation. As one design 1. A community leadership program.
team member said, With two evaluation colleagues, I became
part of the design team for a community
We did the best we knew how with what we leadership program in rural Minnesota.
knew and the resources we had. Now we're The design team included a sociologist, a
at a different place in our development—do- couple of psychologists, a communications
ing and thinking different things. That's specialist, some adult educators, a funder,
development. That's change. But it's not nec- and program staff. All design team mem-
essarily improvement. bers had a range of expertise and experi-
ences. What we shared was an interest in
Developmental programming calls for leadership and community development.
developmental evaluation in which the The relationship lasted over six years
evaluator becomes part of the design team and involved different evaluation ap-
helping to shape what's happening, both proaches each year. During that time, we
processes and outcomes, in an evolving, engaged in participant observation, several
rapidly changing environment of constant different surveys, field observations, tele-
interaction, feedback, and change. The phone interviews, case studies of individu-
developmental perspective, as I experi- als and communities, cost analyses, theory
ence it, feels quite different from the tra- of action conceptualizations, futuring exer-
ditional logic of programming in which cises, and training of participants to do
goals are predetermined and plans are their own evaluations. Each year, the pro-
carefully made for achieving those goals. gram changed in significant ways and new
Intended Process Uses • 107

evaluation questions emerged. Program multiple voices presenting multiple per-

goals and strategies evolved. The evalu- spectives. These voices and perspectives
ation evolved. No final report was ever were facilitated and organized by the evalu-
written. The program continues to evolve ation team, but the evaluator's voice was
—and continues to rely on developmental simply one among many. The developmen-
evaluation. tal evaluation and process are still ongoing
as this is being written. No summative
2. Supporting diversity in schools. A evaluation is planned or deemed appropri-
group of foundations agreed to support ate, though a great deal of effort is going
multicultural education in the Saint Paul into publicly communicating the develop-
Public Schools for 10 or more years. Com- mental processes and outcomes.
munity members identified the problem as
low levels of success for children of color 3. Children's and families' community
on virtually every indicator they examined, initiative. A local foundation made a 20-
for example, attendance, test scores, and year commitment to work with two inner-
graduation. The "solution" called for a city neighborhoods to support a healthier
high degree of community engagement, es- environment for children and families. The
pecially by people of color, in partnering communities are poor and populated by
with schools. The nature of the partnering people of diverse ethnic and racial back-
and interim outcomes were to emerge from grounds. The heart of the commitment was
the process. Indeed, it would have been to provide funds for people in the commu-
"disempowering" to local communities to nity to set their own goals and fund projects
predetermine the desired strategies and they deemed worthwhile. A community-
outcomes prior to their involvement. based steering committee became, in effect,
Moreover, different communities of color a decision-making group for small commu-
—African Americans, Native Americans, nity grants. Grant-making criteria, desired
Hispanics, and Southeast Asians—could be outcomes, and evaluation criteria all had to
expected to have varying needs, set differ- be developed by the local community. The
ing goals, and work with the schools in purpose of the developmental process was
different ways. All of these things had to be to support internal, community-based ac-
developed. countability (as opposed to external judg-
The evaluation documented develop- ment by the affluent and distant board of
ments, provided feedback at various levels the sponsoring foundation). My role, then,
from local communities to the overall dis- was facilitating sessions with local commu-
trict, and facilitated the process of commu- nity leaders to support their developing
nity people and school people coming to- their own evaluation process and sense of
gether to develop evaluative criteria and shared accountability. The evaluation pro-
outcome claims. Both the program design cess had to be highly flexible and respon-
and evaluation changed at least annually, sive. Aspects of participatory and empow-
sometimes more often. In the design pro- erment evaluation also were incorporated.
cess, lines between participation, pro- Taking a 20-year developmental perspec-
gramming, and evaluation were ignored as tive, where the locus of accountability is
everyone worked together to develop the community-based rather than funder-
program. As noted earlier in this chapter, based, changes all the usual parameters of
the evaluation reports took the form of evaluation.

4. A reflective practice process in adult ation participant observers, my evaluation

education. I've been working for several partner and I provided daily feedback to
years with a suburban adult and commu- program staff about issues surfacing in our
nity education program in facilitating a interviews and observations. Staff used that
reflective practice process for staff develop- feedback to shape the program, not just in
ment and organizational change. We meet the formative sense of improvement, but in
monthly to get reports from staff about a developmental way, actually designing
their action research observations for the the program as it unfolded. My evaluation
last month. The focus of these observations partner and I became part of the decision-
is whatever issue the group has chosen the making staff that conceptualized the pro-
previous month. The reflective practice gram. Our evaluative questions, quite apart
process involves: (1) identifying an issue, from the data we gathered and fed back,
interest, or concern; (2) agreeing to try helped shape the program.
something; (3) agreeing to observe some An example will illustrate our develop-
things about what is tried; (4) reporting mental role. Early in the first trip, we fo-
back to the group individually; (5) identi- cused staff attention on our observation
fying patterns of experience or themes that participants were struggling with the
across the separate reports; (6) deciding transition from city to wilderness. After
what to try next, that is, determining the considerable discussion and input from
action implications of the findings, and (7) participants, staff decided to have evening
repeating the process with the new com- discussions on this issue. Out of those dis-
mitment to action. Over several years, this cussions, a group exercise evolved in
process has supported major curricular and which, each morning and evening, we
organizational change. Evaluation is ongo- threw our arms about, shook our legs, and
ing and feedback is immediate. The process tossed our heads in a symbolic act of casting
combines staff and organizational develop- off the toxins that had surfaced from hid-
ment and evaluation. My role as facilitator den places deep inside. The fresh air,
is to keep them focused on data-based ob- beauty, quiet, fellowship, periods of soli-
servations and help them interpret and ap- tude, and physical activity combined to
ply findings. There are no formal reports "squeeze out the urban poisons." Partici-
and no formative or summative judgments pants left the wilderness feeling cleaner and
in the usual evaluation sense. There is only purer than they had felt in years. They
an ongoing developmental process of in- called that being "detoxified." Like the
cremental change, informed by data and drunk who is finally sober, they took their
judgment, which has led to significant cu- leave from the wilderness committed to
mulative evolution of the entire program. staying clear of the toxins.
This has become a learning organization.
No one was prepared for the speed of
retoxification. Follow-up interviews re-
5. Wilderness education for college ad- vealed that participants were struggling
ministrators. Earlier in this chapter, I de- with reentry. As evaluators, we worked
scribed briefly the use of journals and inter- with staff to decide how to support partici-
views in a wilderness education program as pants in dealing with reentry problems.
an example of intervention-oriented evalu- When participants came back together
ation. That same project provides an exam- three months later, they carried the knowl-
ple of developmental evaluation. As evalu- edge that detox faded quickly and enduring
Intended Process Uses • 109

purification couldn't be expected. Then the But, once again, a note of caution about
wilderness again salved them with its language. The term development carries
cleansing power. Most left the second trip negative connotations in some settings.
more determined than ever to resist retoxi- Miller (1981), in The Book of Jargon, de-
fication, but the higher expectations only fines development as "a vague term used to
made the subsequent falls more distressing. euphemize large periods of time in which
Many came to the third trip skeptical and nothing happens" (p. 208). Evaluators are
resistant. It didn't matter. The San Juan well advised to be attentive to what specific
River didn't care whether participants em- words mean in a particular context to spe-
braced or resisted it. After 10 days rowing cific intended users—and to choose their
and floating, participants, staff, and evalua- terms accordingly.
tors abandoned talking about detox as an One reaction I've had from colleagues is
absolute state. We came to understand it as that: the examples I've shared above aren't
a matter of degree and a process: an ongo- "evaluations" at all but rather organiza-
ing struggle to monitor the poisons around tional development efforts. I won't quarrel
us, observe carefully their effects on our with that. There are sound arguments for
minds and bodies, and have the good sense defining evaluation narrowly in order to
to get to the wilderness when being poi- distinguish genuinely evaluative efforts
soned started to feel normal. This under- from other kinds of organizational muck-
standing became part of the program ing around. But, in each of the examples
model developed jointly by participants, I've shared, and there are many others, my
staff, and evaluators—but as evaluators we participation, identity, and role were con-
led the discussions and pushed for concep- sidered evaluative by those with whom I
tual clarity beyond what staff and partici- was engaged (and by whom I was paid).
pants would likely have been able to do There was no pretense of external inde-
without an evaluation perspective. pendence. My role varied from being
evaluation facilitator to full team member.
In no case was my role external reporting
and accountability.
Commentary on Developmental evaluation certainly in-
Developmental Evaluation volves a role beyond being solely an evalu-
ator, but I include it among the things we
It will be clear to the reader, I trust, that evaluators can do because organizational
my evaluation role in each of the programs development is a legitimate use of evalu-
just reviewed involved a degree of engage- ation processes. What we lose in concep-
ment that went beyond the independent tual clarity and purity with regard to a
data collection and assessment that have narrow definition of evaluation that fo-
traditionally defined evaluation functions. cuses only on judging merit or worth, we
Lines between evaluation and development gain in appreciation for evaluation exper-
became blurred as we worked together col- tise. When Scriven (1995) cautions against
laboratively in teams. I have found these crossing the line from rendering judgments
relationships to be substantially different to offering advice, I think he underesti-
from the more traditional evaluations I mates the valuable role evaluators can play
conducted earlier in my practice. My role in design and program improvement based
has become more developmental. on cumulative knowledge. Part of my value

to a design team is that I bring a reservoir The four kinds of process use identified
of knowledge (based on many years of and discussed here—(1) enhancing shared
practice and having read a great many understandings, (2) reinforcing interven-
evaluation reports) about what kinds of tions, (3) supporting participant engage-
things tend to work and where to anticipate ment, and (4) developing programs and
problems. Young and novice evaluators organizations—have this in common: They
may be well advised to stick fairly close to all go beyond the traditional focus on find-
the data. However, experienced evaluators ings and reports as the primary vehicles for
have typically accumulated a great deal of evaluation impact. As such, these new di-
knowledge and wisdom about what works rections have provoked controversy. Six
and doesn't work. More generally, as a kinds of objections-—closely interrelated,
profession, we know a lot about patterns of but conceptually distinct—arise most con-
effectiveness, I think—and will know more sistently:
over time. That knowledge makes us valu-
able partners in the design process. Cross- 1. Definitional objection. Evaluation
ing that line, however, can reduce indepen- should be narrowly and consistently de-
dence of judgment. The costs and benefits fined in accordance with the "common
of such a role change must be openly ac- sense meaning of evaluation," namely, "the
knowledged and carefully assessed. systematic investigation of the merit or
worth of an object" (Stufflebeam 1994:
323). Anything other than that isn't evalu-
ation. Adding terms such as empowerment
Concerns, Controversies, or developmental to evaluation changes fo-
and Caveats cus and undermines the essential nature of
evaluation as a phenomenon unto itself.
Menu 5.1 summarizes the four primary
uses of evaluation logic and processes dis- 2. Goals confusion objection. The goal
cussed in this chapter. As I noted in opening of evaluation is to render judgment. "While
this chapter, any evaluation can, and often . . . 'helping people help themselves' is a
does, have these kinds of effects uninten- worthy goal, it is not the fundamental goal
tionally or as an offshoot of using findings. of evaluation" (Stufflebeam 1994:323).
What's different about utilization- fo-
cused evaluation is that the possibility and 3. Role confusion objection. Evaluators
desirability of learning from evaluation as people may play various roles beyond
processes, as well as from findings, can be being an evaluator, such as training clients
made intentional and purposeful—an op- or helping staff develop a program, but in
tion for intended users to consider building taking on such roles, one moves beyond
in from the beginning. In other words, being an evaluator and should call the role
instead of treating process use as an infor- what it is, for example, trainer or devel-
mal ripple effect, explicit and up-front at- oper, not evaluator.
tention to the potential impacts of evalu-
ation logic and processes can increase those While one might appropriately assist clients
impacts and make them a planned purpose in these ways, such services are not evalu-
for undertaking the evaluation. In this way ation. . . . The evaluator must not confuse or
the evaluation's overall utility is increased. substitute helping and advocacy roles with
Intended Process Uses • 111

r MENU 5.1

Four Primary Uses of Evaluation Logic and Processes

For the uses below, the impact of the evaluation comes from application of
evaluation thinking and engaging in evaluation processes (in contrast to impacts
that come from using specific findings).

Uses Examples
Enhancing shared understandings Specifying intended uses to provide focus
and generate shared commitment
Managing staff meetings around explicit
Sharing criteria for equity/fairness
Giving voice to different perspectives and
valuing diverse experiences
Supporting and reinforcing the Building evaluation into program delivery
program intervention processes
Having participants monitor their own
Specifying and monitoring outcomes as inte-
gral to working with program participants
Increasing engagement, self- Participatory and collaborative evaluation
determination, and ownership Empowerment evaluation
Reflective practice
Program and organizational Developmental evaluation
development Action research
Mission-oriented, strategic evaluation
Evaluability assessment
Model specification

NOTE: Menu 4.1 (Chapter 4) presents a corresponding menu, "Three Primary Uses of Evaluation Findings,"
which addresses making judgments (e.g., summative evaluation), improving programs (formative evaluation),
and generating knowledge (e.g., meta-analyses and syntheses).

rendering ol assessments of the merit and/or thing is or is not working (an evaluator's
worth of objects that he/she has agreed to role) is quite different from knowing how
evaluate. (Stufflebeam 1994:324) to fix or improve it (a designer's role).

Scriven (1991a) has been emphatic in argu- 4. Threat to data validity objection.
ing that being able to identify that some- Quantitative measurement specialists teach

that data collection, in order for the results conveys the particular message, positive or
to be valid, reliable, and credible, should be negative, that the client/interest group hopes
separate from the program being evalu- to present, irrespective of the data, or one
ated. Integrating data collection in such a that promotes constructive, ongoing, and
way that it becomes part of the intervention nonthreatening group process. . . .
contaminates both the data and the pro- Many administrators caught in political
gram. conflicts would likely pay handsomely for
such friendly, nonthreatening, empowering
5. Loss of independence objection. Ap- evaluation service. Unfortunately, there are
proaches that depend on close relation- many persons who call themselves evaluators
ships between evaluators and other stake- who would be glad to sell such service,
holders undermine the evaluator's neu- (p. 325)
trality and independence. "It's quite com-
m o n for younger evaluators to 'go native,' These are serious concerns that have
that is, psychologically join the staff of the sparked vigorous debate (e.g., Fetterman
program they are supposed to be evaluating 1995). In Chapter 14 on the politics and
and become advocates instead of evalua- ethics of utilization-focused evaluation,
tors" (Scriven 1991a:41). This can lead to I'll address these concerns with the seri-
overly favorable findings and an inability to ousness they deserve. For the purpose of
give honest, negative feedback. concluding this chapter, it is sufficient to
note that the utilization-focused evaluator
6. Corruption and misuse objection. w h o presents to intended users options
Evaluators w h o identify with and support that go beyond n a r r o w and traditional
program goals, and develop close relation- uses of findings has an obligation t o dis-
ships with staff and/or participants, can be close and discuss objections to such ap-
inadvertently co-opted into serving public proaches. As evaluators explore n e w and
relations functions or succumb to pressure innovative options, they must be clear that
to distort or manipulate data, hide negative dishonesty, corruption, data distortion,
findings, and exaggerate positive results. and selling out are not on the menu.
Even if they manage to avoid corruption, W h e r e primary intended users w a n t and
they may be suspected of it, thus undermin- need an independent, summative evalu-
ing the credibility of the entire profession. ation, that is what they should get. W h e r e
Or these approaches may actually serve they want the evaluator to act inde-
intentional misuse and foster corruption, as pendently in bringing forward improve-
Stufflebeam (1994) worries: ment-oriented findings for formative
evaluation, that is what they should get.
What worries me most about . . . empower- But those are n o longer the only options
ment evaluation is that it could be used as a on the menu of evaluation uses. N e w par-
cloak of legitimacy to cover up highly cor- ticipatory, collaborative, intervention-
rupt or incompetent evaluation activity. oriented, and developmental approaches
Anyone who has been in the evaluation busi- are already being used. T h e utilization-
ness for very long knows that many potential focused issue is not w h e t h e r such ap-
clients are willing to pay much money for a proaches should exist. T h e y already do.
"good, empowering evaluation," one that T h e issues are understanding w h e n such
Intended Process Uses • 113

approaches are appropriate and helping Nasrudin is the classical figure devised by the
intended users make informed decisions dervishes partly for the purpose of halting
about their appropriateness. T h a t ' s what for a moment situations in which certain
the next chapter addresses. states of mind are made clear. . . . Since
Sufism is something which is lived as well as
something which is perceived, a Nasrudin
Note tale cannot in itself produce complete en-
lightenment. On the other hand, it bridges
1. Sufi stories, particularly those about the the gap between mundane life and a trans-
adventures and follies of the incomparable mutation of consciousness in a manner
Mulla (Master) Nasrudin, are a means of com- which no other literary form yet produced
municating ancient wisdom: has been able to attain. (Shah 1964:56)
Focusing Evaluations:
Choices, Options, and Decisions

Desiderata for the Indecisive and Complacent

m Jfc o placidly amid the noise and haste, and remember what peace there may be
^fc. -^ ^ in avoiding options. As far as possible, without surrender, be on good terms
with the indecisive. Avoid people who ask you to make up your mind; they are vexations
to the spirit. Enjoy your indecisiveness as well as your procrastinations. Exercise caution
in your affairs lest you be faced with choices, for the world is full of menus. Experience
the joys of avoidance.
You are a child of the universe, no less than the trees and the stars; you have a right to
do and think absolutely nothing. And if you want merely to believe that the universe is
unfolding as it should, avoid evaluation, for it tests reality. Evaluation threatens compla-
cency and undermines the oblivion of fatalistic inertia. In undisturbed oblivion may lie
happiness, but therein resides neither knowledge nor effectiveness.

—Halcolm's Indesiderata
Being Active-Reactive-Adaptive
Evaluator Roles, Situational Responsiveness,
and Strategic Contingency Thinking

* t ^ " " I utnan propensities in the face of evaluation: feline curiosity; stultifying fear;
| I beguiling distortion of reality; ingratiating public acclamation; inscrutable
selective perception; profuse rationalization; and apocalyptic anticipation. In other words,
the usual run-of-the-mill human reactions to uncertainty.
Once past these necessary initial indulgences, it's possible to get on to the real evalu-
ation issues: What's worth knowing? How will we get it? How will it be used?
Meaningful evaluation answers begin with meaningful questions.

A young hare, born to royal-rabbit parents in a luxury warren, showed unparalleled

speed. I Ic wan races far and wide, training under the world's best coach. He boasted
that he could beat anyone in the forest.
The only animal to accept the hare's challenge was an old tortoise. This first amused,
then angered the arrogant hare, who felt insulted. The hare agreed to the race, ridiculing
the tortoise to local sports columnists. The tortoise said simply, "Come what may, I
will do my best."
A course was created that stretched all the way through and back around the forest.
The day of the race arrived. At the signal to start, the hare sped away, kicking dust in
the tortoise's eyes. The tortoise slowly meandered down the track.


Halfway through the race, rain began falling in torrents. The rabbit hated the feel of
cold rain on his luxuriously groomed fur, so he stopped for cover under a tree. The
tortoise pulled his head into his shell and plodded along in the rain.
When the rain stopped, the hare, knowing he was well ahead of the meandering
tortoise and detesting mud, decided to nap until the course dried. The track, however,
was more than muddy; it had become a stream. The tortoise turned himself over, did
the backstroke, and kept up his progress.
By and by, the tortoise passed the napping hare and won the race. The hare blamed
his loss on "unfair and unexpected conditions," but observant sports columnists
reported that the tortoise had beaten the hare by adapting to conditions as he found
them. The hare, it turned out, was only a "good conditions" champion.

Evaluation Conditions

What are good evaluation conditions? der such ideal conditions? Never. (Bring on
Here's a wish list (generated by some col- another round of drinks.)
leagues over drinks late one night at the The real world doesn't operate under
annual conference of the American Evalu- textbook conditions. Effective evaluators
ation Association). The program's goals are learn to adapt to changed conditions. This
clear, specific, and measurable. Program requires situational responsiveness and
implementation is standardized and well strategic, contingency thinking—what I've
managed. The project involves two years of come to call being active-reactive-adaptive
formative evaluation working with open, in working with primary intended users.
sophisticated, and dedicated staff to im- By way of example, let me begin by
prove the program; this is followed by a describing how the focus of an evaluation
summative evaluation for the purpose of can change over time. To do so, I'll draw
rendering independent judgment to an in- on the menu of uses offered in the previous
terested and knowledgeable funder. The two chapters. In Chapter 4, we considered
evaluator has ready access to all necessary three uses for findings: rendering summa-
data and enthusiastic cooperation from all tive judgments; improving programs for-
necessary people. The evaluator's role is matively; and generating knowledge about
clear and accepted. There are adequate re- generic patterns of effectiveness. In Chap-
sources and sufficient time to conduct a ter 5, we considered four uses of evaluation
comprehensive and rigorous evaluation. processes and logic: enhancing communi-
The original evaluation proposal can be cations and understanding; reinforcing a
implemented as designed. No surprises program intervention; supporting partici-
turn up along the way, like departure of the pant engagement, ownership, and empow-
program's senior executive or the report erment; and program or organizational de-
deadline moved up six months. velopment. The following example will
How often had this experienced group illustrate how these uses can be creatively
of colleagues conducted an evaluation un- combined in a single project to build on
Being Active-Reactive-Adaptive • 119

and reinforce each other over time as this case to illustrate the menus of evalu-
conditions and needs change. Then we'll ation uses, both use of findings and use of
consider a menu of evaluator roles and evaluation logic and processes.
examine some of the situations and contin-
gencies that influence the choice of evalu- Formative evaluation. Parent feedback
ator roles and type of evaluation. surveys have been used from the begin-
ning to make the programs responsive to
parent needs and interests. More recently,
Changing Uses Over Time a large-scale, statewide evaluation involv-
ing 29 programs has used pre-post inter-
Since the 1970s, Minnesota has been at views and videotapes with parents to
the forefront in implementing Early Child- share information and results across pro-
hood Family Education programs through grams. Staff have discussed program vari-
school districts statewide. These programs ations, identified populations with which
offer early screening for children's health they are more and less successful, and
and developmental problems; libraries of shared materials. The evaluation has be-
books, toys, and learning materials; and par- come a vehicle for staff throughout the
ent education classes that include parent- state to share ideas about everything from
only discussions as well as activities with recruitment to outcomes assessment. Every
their infants, toddlers, and preschoolers. program in the state can identify improve-
Parents learn about child development; ments made as a result of this evaluation-
ways of supporting their child's intellec- centered sharing and staff development.
tual, emotional, and physical growth; and
how to take care of themselves as parents. Summative evaluation. Periodically
Some programs include home visits. A hall- the program has produced formal evalu-
mark of the program has been universal ation reports for the state legislature. A
access and outreach; that is, the program is great deal is at stake. For example, in
not targeted selectively to low-income 1992, the program was funded by over
families or those deemed at risk. The pro- $26 million in state aid and local levies.
gram serves over 260,000 young children At a time of severe funding cuts for all
and their parents annually. It is the nation's kinds of programs, the universal access
largest and oldest program of its kind. philosophy and operations of the program
Evaluation has been critical to the pro- came under attack. Why should the state
gram's development, acceptance, and ex- fund parent education and parent-child
pansion over the years. Evaluation meth- activities for middle-class parents? To
ods have included parent surveys, field save money and more narrowly focus the
observations of programs, standardized in- program, some legislators and educators
struments measuring parenting knowledge proposed targeting the program for low-
and skills, interviews with staff and parents, income and at-risk parents. The pro-
pre-post assessments of parent-child inter- gram's statewide evaluation played a ma-
actions, and videotapes of parent-child in- jor role in that debate. The summative
teraction (Mueller 1996). I have been in- report, entitled Changing Times, Chang-
volved with these statewide evaluation ing Families (MECFE 1992), was distrib-
efforts for over 20 years, so I want to use uted widely both within and outside the

legislature. It described parent outcomes a number of programmatic implications,

in great detail, showing that middle-class but at the level of cross-program synthe-
parents had a great deal to learn about sis, it represents knowledge generation.
parenting. Pre-post interviews with 183 The lessons learned by Minnesota staff
parents showed how parent knowledge, have been shared with programs in other
skills, behaviors, and feelings changed. states, and vice versa. One kind of impor-
Smaller samples examined effects on sin- tant lesson learned has been how to use
gle parents and teen parents. The report evaluation processes to enhance program
also included recommendations for pro- effectiveness. We turn, then, to uses of
gram improvement, for example, working evaluation logic and processes.
harder and more effectively to get fathers
involved. The summative evaluation con- Using evaluation to enhance mutual
tributed to the legislature's decision to understanding. All evaluation instru-
maintain universal access and expand sup- ments for 20 years have been developed
port for early childhood parent education with full staff participation. At full-day
programming. State staff felt that, without sessions involving program directors from
a summative evaluation, the program all over the state, rural, urban, and state
would have been especially vulnerable staff have shared understandings about
to questions about the value of serving program priorities and challenges as they
middle-class parents. The summative re- have operationalized outcomes. State staff
port anticipated that policy question, be- have shared legislators' priorities. Rural
cause legislators were identified as pri- staff have shared parents' concerns. All
mary intended users. have discussed, and sometimes debated,
what kinds of changes are possible, impor-
Knowledge-generating evaluation. The tant, and/or crucial. The
fact that program offerings and imple- came a mechanism for formalizing, shar-
mentation vary from district to district ing, and communicating the program's
throughout the state has offered opportu- philosophy, priorities, and approaches
nities to synthesize lessons learned. For among diverse directors separated some-
example, using comparative data about times by hundreds of miles.
varying degrees of effectiveness in chang-
ing parent-child interactions, staff ana- Intervention-oriented evaluation. In
lyzed different ways of working with par- each program, a sample of parents was
ents. One theme that emerged was the selected for pre-post interviews and vide-
importance of directly engaging and train- otapes of parent-child interactions. Staff,
ing parents in how to observe their chil- after evaluation training, conducted the
dren. Early in the program, staff trained interviews and made the videotapes. Staff
in child development underestimated the soon discovered that the processes of in-
skills and knowledge involved in observ- terviewing and videotaping were power-
ing a child. New parents lacked a context ful interventions. In the course of data
or criteria for observing their own chil- collection, staff and parents got to know
dren. Having parents and children come each other, developed rapport and trust,
together in groups provided opportunities and discussed parents' concerns. Soon,
to make observational skills a focus of parents not included in the study design
parent education. This understanding has were asking to be interviewed and video-
Being Active-Reactive-Adaptive • 121

taped. Some programs have decided to among children. In so doing, we've been
continue pre-post interviews with all par- engaged in a long-term process of model
ents and are routinely videotaping and specification and program development
reviewing the results with parents. The that go well beyond and have a larger
interviews and videotapes support and re- impact than simply deciding what data to
inforce the program's goal of making par- collect in the next round of evaluation.
ents more reflective and observant about The evaluation deliberation process has
their parentingo. become a vehicle for program development
Data collection has become a valued beyond use of findings about effectiveness.

Participation, collaboration, and em-

powerment. The staff has had complete Variable Evaluator Roles Linked
ownership of the evaluation from the be- to Variable Evaluation Purposes
ginning. From determining the focus of
each subsequent evaluation through data Different types of and purposes for
collection and analysis, staff have partici- evaluation call for varying evaluator roles.
pated fully. My role, and that of my evalu- Gerald Barkdoll (1980), as associate com-
ation colleagues, has been to support and missioner for planning and evaluation of
facilitate the process. Program directors the U.S. Food and Drug Administration,
have reported feeling affirmed by the re- identified three contrasting evaluator roles.
search knowledge they have gained. Most His first type, evaluator as scientist, he
recently, they have been interpreting fac- found was best fulfilled by aloof academics
tor analysis and regression coefficients who focus on acquiring technically impec-
generated from the latest statewide effort. cable data while studiously staying above
They have learned how to interpret other the fray of program politics and utilization
evaluation and research studies in the relationships. His second type he called
course of working on their own evalu- "consultative" in orientation; these evalua-
ations. They have taken instruments de- tors were comfortable operating in a col-
veloped for statewide evaluation and laborative style with policymakers and pro-
adapted them for ongoing local program gram analysts to develop consensus about
use. They feel competent to discuss the their information needs and decide jointly
results with school superintendents and the evaluation's design and uses. His third
state legislators. They're also able to en- type he called the "surveillance and com-
gage with confidence in discussions about pliance" evaluator, a style characterized by
what can and can't be measured. aggressively independent and highly criti-
cal auditors committed to protecting the
public interest and ensuring accountability
Developmental evaluation. I've been
(e.g., Walters 1996). These three types re-
involved with many of these program di-
flect evaluation's historical development
rectors for 20 years. Over the years, we've
from three different traditions: (1) social
wrestled with questions of how knowl-
science research; (2) pragmatic field prac-
edge change relates to behavior change,
tice, especially by internal evaluators and
how much importance to attach to attitu-
consultants; and (3) program and financial
dinal change and increases in parent con-
fidence, and what outcomes to monitor

When evaluation research aims to gen- Contrast such a national accountability

erate generalizable knowledge about causal evaluation with an evaluator's role in help-
linkages between a program intervention ing a small, rural leadership program of the
and outcomes, rigorous application of so- Cooperative Extension Service increase its
cial science methods is called for and the impact. The program operates in a few
evaluator's role as methodological expert local communities. The primary intended
will be primary. When the emphasis is on users are the county extension agents,
determining a program's overall merit or elected county commissioners, and farmer
worth, the evaluator's role as judge takes representatives who have designed the pro-
center stage. If an evaluation has been com- gram. Program improvement to increase
missioned because of and is driven by pub- participant satisfaction and behavior change
lic accountability concerns, the evaluator's is the intended purpose. Under these con-
role as independent auditor, inspector, or ditions, the evaluation's use will depend
investigator will be spotlighted for policy- heavily on the evaluator's relationship with
makers and the general public. When pro- design team members. The evaluator will
gram improvement is the primary purpose, need to build a close, trusting, and mutually
the evaluator plays an advisory and facili- respectful relationship to effectively facili-
tative role with program staff. As a member tate the team's decisions about evaluation
of a design team, a developmental evalu- priorities and methods of data collection
ator will play a consultative role. If an and then take them through a consensus-
evaluation has a social justice agenda, the building process as results are interpreted
and changes agreed on.
evaluator becomes a change agent.
In utilization-focused evaluation, the These contrasting case examples illus-
evaluator is always a negotiator—negotiat- trate the range of contexts in which pro-
ing with primary intended users what other gram evaluations occur. The evaluator's
roles he or she will play. Beyond that, all role in any particular study will depend on
roles are on the table, just as all methods matching her or his role with the context
are options. Role selection follows from and purposes of the evaluation as negoti-
and is dependent on intended use by in- ated with primary intended users.
tended users.
Consider, for example, a national evalu- Academic Versus
ation of food stamps to feed low-income Service Orientations
families. For purposes of accountability
and policy review, the primary intended One of the most basic role divisions in
users are members of the program's over- the profession is that between academic
sight committees in Congress (including and service-oriented evaluators, a division
staff to those committees). The program is identified by Shadish and Epstein (1987)
highly visible, costly, and controversial, es- when they surveyed a stratified random
pecially because special interest groups dif- sample of the members of the Evaluation
fer about its intended outcomes and who Network and the Evaluation Research So-
should be eligible. Under such conditions, ciety, the two organizations now merged
the evaluation's credibility and utility will as the American Evaluation Association.
depend heavily on the evaluators' inde- The authors inquired about a variety of
pendence, ideological neutrality, methodo- issues related to evaluators' values and
logical expertise, and political savvy. practices. They found that responses clus-
Being Active-Reactive-Adaptive • 123

tered around two contrasting views of the American Evaluation Association

evaluation. AcademicMi^uators tend to be elected successive presidents who repre-
at universities and emphasize the research sented two quite divergent perspectives.
purposes of evaluation, traditional stan- While on the surface the debate was partly
dards of methodological rigor, summative about methods—quantitative versus quali-
outcome studies, and contributions to so- tative, a fray we shall enter in Chapter
cial science theory. Servic£__eyaluators tend 12—the more fundamental conflict cen-
to be independent consultants or Internal tered on vastly different images of the pro-
evaluators and emphasize serving stake- fession.
holders' needs, program improvement, Yvonna Lincoln (1991), in her 1990
qualitative methods, and assisting with pro- presidential address, advocated what I
gram decisions. According to Shadish and would call an activist role for evaluators,
Epstein, "The general discrepancy between one that goes beyond just being competent
service-oriented and academically oriented applied researchers who employ traditional
evaluators seems warranted on both theo- scientific methods to study programs—the
retical and empirical grounds" (p. 587). academic perspective. She first lamented
In addition, Shadish and Epstein (1987) and then disputed the notion that " 'sci-
found that 3 1 % of the respondents de- ence' is about knowing and 'art' is about
scribed their primary professional identity feeling and spiritual matters" (p. 2). She
as that of "evaluator" (p. 560). Others went on to talk about the need for an "arts
thought of themselves first as a psycholo- and sciences of eyalua£ plurals" (em-
gist, sociologist, economist, educator, and phasis in the original) in which she identi-
so on, with identity as an evaluator second- fied four new sciences and six new arts for
ary. Evaluators whose primary professional evaluation (pp. 2-6).
identity was evaluation were more likely to
manifest the service/stakeholder orienta- Lincoln's New
tion, with an emphasis on formative evalu- Sciences of Evaluation
ation and commitment to improved pro-
gram decision making. Those who did not 1. The science of locating interested
identify primarily as evaluators (but rather stakeholders
took their primary identity from a tradi- 2. The science of getting information—
tional academic discipline) were signifi- good, usable information—to those
cantly more likely to be engaged in aca- same stakeholders
demic evaluative research emphasizing 3. The science of teaching various stakeholder
research outcomes and summative judg- groups how to use information to empower
ments (p. 581). themselves
4. A science of communicating results

Enter Morality: Activist Lincoln's New Arts of Evaluation

Versus Academic Evaluation
1. The art of judgment, not only our own, but
The profession of evaluation remains eliciting the judgments of stakeholders
very much split along these lines, but with 2. The art of "appreciating" in our stakeholders
new twists and, perhaps, deeper antago- and in ourselves, that is, comprehending
nisms. The schism erupted openly, and per- meaning within a context. . ., seeing some-
haps deepened, in the early 1990s, when thing fully

3. The art of cultural analysis But what alienated Sechrest the most
4. The art of hearing secret harmonies, that is, was the tone of moral superiority he heard
listening for meanings in Lincoln's call for a new arts and sciences
5. The art of negotiating, not only our con- of evaluation. Referring t o a notice he had
tracts, but the worlds in which our target seen, presumably inspired by Lincoln's per-
populations and audiences live spective, that invited organizing "the N e w
6. The art of dealing with people different from Power Generation" into a group that might
ourselves be called The Moral Evaluators, he replied,

Lincoln (1991) closed her speech by The most offensive part of the text, however,
asserting that "my message is a moral is the arrogation of the term "moral" by these
o n e . " Because evaluators are a powerful new generation, rebellious evaluators. As
g r o u p , these n e w arts and sciences have constructionists, they should know that mo-
profound moral implications, she argued, rality, like so many other things, is in the eye
including speaking truth to power, of the beholder. They do not look so extraor-
dinarily moral to me. (p. 5)
and to make that truth grounded in lived
experience and in multiple voices. . . . We Sechrest (1992) closed by implying that the
need to move beyond cost-benefit analyses activist stance advocated by Lincoln, pre-
and objective achievement measures to inter- sumably based on commitment to shaping
pretive realms . . . to begin talking about a better world, could be viewed as corrupt.
what our programs mean, what our evalu- Noting that academic evaluators w h o use
ations tell us, and what they contribute to our traditional quantitative methods also care
understandings as a culture and as a society. about finding programs that work, he
We need literally to begin to shape—as a chided, "They are simply not willing to
shaman would—the dreams of all of us into fudge very much to do so" (p. 6).
realities, (p. 6) W h a t Shadish and Epstein (1987) origi-
nally distinguished as academic versus ser-
T h e following year, the American vice orientations has evolved, I believe, into
Evaluation Association president was Lee different conceptions of evaluator activism
Sechrest, w h o by his own definition rep- that continue to split the profession. T h e
resented the traditional, academic view of Lincoln-Sechrest debate foreshadowed a
evaluation. H e objected to Lincoln's no less rancorous exchange between two
metaphorical call for a new generation of more evaluation luminaries, Dan Stuffle-
evaluators. "I ask myself," Sechrest (1992) beam (1994) and David Fetterman (1995),
mused, "could we of the preceding gen- about the morality of empowerment evalu-
erations possibly have given rise to this ation. Stufflebeam fears that such an activ-
new, Fourth Generation? W h e r e in our ist orientation will undermine the credibil-
m a k e u p are the origins of this new crea- ity and integrity of the field. Fetterman sees
ture so unlike us. . . . I sense a very real evaluator activism as realizing the full p o -
and large generational gap" (p. 2). H e tential of the profession t o contribute to
went on to argue the merits of traditional creating a better world, especially for the
scholarly approaches to evaluation, espe- disempowered.
cially the use of quantitative and experi- The degree of evaluator activism is a
mental m e t h o d s . continuum, the endpoints of which have
Being Active-Reactive-Adaptive • 125

been defined by Lincoln and Fetterman Neither more nor less activism, in my
on the activist side and by Sechrest and judgment, is morally superior. Various de-
Stufflebeam on the academic side. One per- grees of activism involve different ways to
spective places evaluators in the fray, even practice as an evaluator, often in different
arguing that we have a moral obligation to arenas. Indeed, how activist to be involves
acknowledge our power and use it to help consideration of an evaluation's purpose,
those in need who lack power. The other decisions about intended users and uses,
perspective argues that evaluation's integ- and the evaluator's own values and com-
rity and long-term contribution to shaping mitments, all of which need to be made
a better world depend on not being per- explicit. The challenge will be to create
ceived as advocates, even though we push appreciation for such diversity among
for use. Eleanor Chelimsky (1995a), a long- those both within and outside the profes-
time champion of evaluation use and the sion who have a single and narrow view of
person who conceived the goal "intended evaluation and its practice. The debate will,
use by intended users," took the occasion and should, go on, for that's how we dis-
of her 1995 presidential address to the cover the implications and ramifications of
American Evaluation Association to warn diverse approaches, but I foresee and desire
against being perceived as taking sides. no turning back of the clock to a single
dominant perspective.
What seems least well understood, in my In their original research on the emerg-
judgment, is the dramatically negative and ing schism in evaluation, Shadish and
long-term impact on credibility of the ap- Epstein (1987) anticipated that, while the
pearance of advocacy in an evaluation. There profession's diversity can help make the
is a vast cemetery out there of unused evalu- field unique and exciting, it also has the
ation findings that have been loudly or potential for increasing tensions between
quietly rejected because they did not "seem" activist and academic interests, "tensions
objective. In short, evaluators' survival in a that arise because of the different demands
political environment depends heavily on and reward structures under which the two
their credibility, as does the use of their groups often operate" (p. 587). They went
findings in policy, (p. 219) on to note that such tensions could lead to
polarization, citing as evidence the debate
My own view, focused as always on within psychology between practicing ver-
utility, is that these different stances, in- sus academic clinical psychologists, which
deed the whole continuum of evaluator has led to a major schism there. They
activism, constitute options for discussion concluded,
and negotiation with primary intended
users. Chelimsky's primary intended users To the extent that the underlying causes are
were members of Congress. Fetterman's similar—and there are indeed some impor-
were, among others, disenfranchised and tant relevant similarities in the political and
oppressed Blacks in South African town- economic characteristics of the two profes-
ships. sions—the lesson to evaluation is clear.
Both national policymakers and people Program evaluation must continue its efforts
in poverty can benefit from evaluation, but to accommodate diverse interests in the same
not in the same ways and not with the same profession. . . . In the long run, evaluation
evaluator roles. will not be well served by parochialism of any

kind—in patterns of practice or anything tion of ethical guidance, and a commitment

else. (p. 588) to professional competence and integrity,
but there are no absolute rules an evaluator
As the professional practice of evalu- can follow to know exactly what to do with
ation has become increasingly diverse, the specific users in a particular situation.
potential roles and relationships have That's why Newcomer and Wholey (1989)
multiplied. Menu 6.1 offers a range of concluded in their synthesis of knowledge
dimensions to consider in defining the about evaluation strategies for building
evaluator's relationship to intended users. high-performance programs: "Prior to an
Menu 6.2 presents options that can be evaluation, evaluators and program man-
considered in negotiations with intended agers should work together to define the
users. The purpose of these menus is to ideal final product" (p. 202). This means
elaborate the multiple roles now available negotiating the evaluation's intended and
to evaluators and the kind of strategic, expected uses.
contingency thinking involved in making Every evaluation situation is unique. A
role decisions. successful evaluation (one that is useful,
I would hope that these menus help practical, ethical, and accurate) emerges
communicate the rich diversity of the field, from the special characteristics and condi-
for I agree with Shadish and Epstein (1987) tions of a particular situation—a mixture
that "in the long run, evaluation will not be of people, politics, history, context, re-
well served by parochialism of any kind— sources, constraints, values, needs, interests,
in patterns of practice or anything else" and chance. Despite the rather obvious,
(p. 588). A parochial practice is one that almost trite, and basically commonsense
repeats the same patterns over and over. A nature of this observation, it is not at all
pluralistic and cosmopolitan practice is one obvious to most stakeholders, who worry a
that adapts evaluation practices to new great deal about whether an evaluation is
situations. being done "right." Indeed, one common
objection stakeholders make to getting ac-
tively involved in designing an evaluation
Situational Evaluation t is that they lack the knowledge to do it
right. The notion that there is one right way
There is no one best way to conduct an to do things dies hard. The rightjway^from
evaluation. a utilization-focused perspectivj^__is_the
This insight is critical. The design of a way that will be meaningful andjiseful_to
particular evaluation depends on the peo- the specific evaluators and intended users
ple involved and their situation. Situa- jnvolved, and finding that way requires
tional evaluation is like situational ethics interaction, negotiation, and situational
(Fletcher 1966) situational leadership
(Blanchard 1986; Hersey 1985), or situ- Alkin (1985) identified some 50 factors
ated learning: "Action is grounded in the associated with use. He organized them
concrete situation in which it occurs" (An- into four categories:
derson et al. 1996:5). The standards and
principles of evaluation (see Chapters 1 1. Evaluator characteristics, such as commit-
and 2) provide overall direction, a founda- ment to make use a priority, willingness
1. Relationship with primary intended users
Distant from/ Close to/
noninteractive highly interactive

2. Control of the evaluation process

Evaluator directed Directed by primary
and controlled; intended users;
evaluator as primary evaluator consults
decision maker

3. Scope of intended user involvement

Very narrow; Involved /in some Involved in all aspects

primarily as audience parts (usually focus of the evauation
for findings but not methods or analysis) from start to finish

4. Number of primary intended users and/or stakeholders—engaged

None All constituencies

One A few Many -

5. Variety of primary intended users engaged

Homogenous Heterogeneous
Dimensions of heterogeneity:
(a) Position in program (funders, board executives, staff, participants, community members, media, onlookers)
(b) Background variables: cultural/racial/ethnic/gender/social class
(c) Regional: geographically near or far
(d) Evaluation: sophistication and experience
(e) Ideology (political perspective/activism)

6. Time line for the evaluation

Tight deadline; Long developmental
little time for timeline; time for
processing with users processing with users
j MENU 6.2
Optional Evaluator Roles

Primary Dominant Style Most Likely Primary Evaluator

Most Lively Primary Users Evaluator Roles of Evaluator Evaluation Purpose Characteristics Affecting Use

1. Funders Judge Authoritative Summative determination of Perceived independence

Officials overall merit or worth Methodological expertise
Decision makers Substantive expertise
Perceived neutrality

2. Funders Auditor Independent Accountability Independence

Policymakers Inspector Compliance Perceived toughness
Board members Investigator Adherence to rules Detail-oriented

3. Academics Researcher Knowledgeable Generate generalizable Methodological expertise

Planners knowledge; truth Academic credentials
Program designer Scholarly status
Policy specialists Peer review support

4. Program staff Consultant for program Interactive Program improvement Perceived understanding of
Program executives improvement Perceptive Trust program
and administrators Insightful Rapport
Participants Insightfulness

5. Diverse stakeholders Evaluation facilitator Available Facilitate judgments and Interpersonal skills
Balanced recommendations by Group facilitation skills
Empathic non-evaluators Evaluator knowledge
Consensus-building skills
6. Program design team Team member with Participatory Program development Contribution to team
evaluation perspective Questioning Insightfulness
Challenging Ability to communicate
evaluation perspective
Analytical leadership

7. Program staff and Collaborator Involved Action research and evaluation Accepting of others
participants Supportive on groups' own issues; Mutual respect
Encouraging participatory evaluation Communication skills
Perceived genuineness of
collaborative approach

Program participants/ Empowerment facilitator Resource person Participant self-determination; Mutual respect
community members pursuit of political agenda Participation
Enabling skills
Political savvy

9. Ideological adherents Supporter of cause Co-leader Social justice Engagement

Committed Commitment
Political expertise
Knowledge of "the system"

10. Future evaluation Synthesizer Analytical Synthesize findings from Professionalism

planners and users Meta-evaluator multiple evaluations Analytical insightfulness
Cluster leader Judge quality of evaluations Conceptual brilliance
Adherence to standards

to involve users, political sensitivity, and dealing with only one primary decision
credibility maker at the outset, and suddenly you have
2. User characteristics, such as interest in the stakeholders coming out your ears, or vice
evaluation, willingness to commit time and versa. With some programs, I've felt like
energy, and position of influence I've been through all 8,000 situations in the
3. Contextual characteristics, such as size of first month.
organization, political climate, and existence And, in case 8,000 situations to analyze,
of competing information be sensitive to, and design evaluations for
4. Evaluation characteristics, such as nature doesn't seem challenging enough, just add
and timing of the evaluation report, rele- two more points to each dimension—a
vance of evaluation information, and quality point between each endpoint and the mid-
of the data and evaluation point. Now, combinations of the five points
on all 20 dimensions yield 3,200,000 p o -
M e n u 6.3 offers examples of situations tentially different situations. Perhaps such
that pose special challenges to evaluation complexity helps explain why the slogan
use and the evaluator's role. that w o n the hearts of evaluators in atten-
Exhibit 6.1 (pages 132-3) is a look at a dance at the 1978 Evaluation N e t w o r k
few of the many situational variables an conference in Aspen, Colorado, was Jim
evaluator may need to be aware of and take Hennes's lament:
into account in conducting a utilization-
Evaluators do IT
focused, feasibility-conscious, propriety-
under difficult circumstances.
oriented, and accuracy-based evaluation.
The situational variables in Exhibit 6.1 Of course, one could make the same
are presented in no particular order. Most analysis for virtually any area of decision
of them could be broken down into several making, couldn't one? Life is complex, so
additional variables. I have n o intention of what's new? First, let's look at what's old.
trying to operationalize these dimensions The evidence from social and behavioral
(that is, make them clear, specific, and science is that in other areas of decision
measurable). The point of presenting them making, when faced with complex choices
is simply to emphasize and reiterate this: and multiple situations, we fall back on a
Situational evaluation means that evalua- set of rules and standard operating proce-
tors have to be prepared to deal with a lot dures that predetermine what we will do,
of different people and situations. If we that effectively short-circuit situational
conceive of just three points (or situations) adaptability. The evidence is that we are
on each of these dimensions—the two end- running most of the time on prepro-
points and a midpoint—then the combina- grammed tapes. T h a t has always been the
tions of these 2 0 dimensions represent function of rules of t h u m b and scientific
8,000 unique evaluation situations. paradigms. Faced with a new situation, the
N o r are these static situations. The pro- evaluation researcher (unconsciously)
gram you thought was new at the first turns to old and comfortable patterns. This
session turns out to have been created out may help explain why so many eval- uators
of and to be a continuation of another w h o have rhetorically embraced the phi-
program; only the name has been changed losophy of situational evaluation find that
to protect the guilty. You thought you were the approaches in which they are trained
Being Active-Reactive-Adaptive • 131

MENU 6.3

r Pose Special Challenges to Evaluation Use and

Examples of Situations Th;
the Evaluator's Role
Special Evaluator
Situation Challenge Skills Needed

1. Highly controversial Facilitating different points Conflict resolution skills

issue of view

2. Highly visible program Dealing with publicity about Public presentation skills
the program; reporting Graphic skills
findings in a media-circus Media handling skills

3. Highly volatile program Rapid change in context, Tolerance for ambiguity

environment issues, and focus Rapid responsiveness
Being a "quick study"

4. Cross-cultural or Including different Cross-culture sensitivity

international perspectives, values Skills in understanding and
Being aware of cultural incorporating different
blinders and biases perspectives

5. Team effort Managing people Identifying and using individ-

ual skills of team members;
team-building skills

6. Evaluation attacked Preserving credibility Calm; staying focused on

evidence and conclusions

7. Corrupt program Resolving ethical issues/ Integrity

upholding standards Clear ethical sense
and with which they are most comfortable adaptation to changed and changing condi-
just happen to be particularly appropriate
priate tions, as opposed to a technical approach,
in each new evaluation situation they' con- which attempts to mold and define condi-
front—time after time after time. Sociolo-
ciolo- tions to fit preconceived models of h o w
gists just happen to find doing a survey
urvey things should be done. Utilization-focused
appropriate. Economists just happen to :o feel evaluation involves overcoming what
the situation needs cost-benefit analysis.
alysis. Brightman and Noble (1979) have identi-
Psychologists study the situation and decide
lecide fied as "the ineffective education of deci-
that—surprise!—testing would be appropri-
ropri- sion scientists." They portray the typical
ate. And so it goes. decision scientist (a generic term for
Utilization-focused evaluation is a prob- evaluators, policy analysts, planners, and so
lem-solving approach that calls for creative on) as follows:

Examples of Situational Factors in Evaluation
That Can Affect Users' Participation and Use

One primary decision maker Large number

1. Number of stakeholders to be
dealt with
Formative purpose Summative purpose
(improvement) 2. Purpose of the evaluation (funding decision)

New program Long history

3. History of the program

Enthusiasm Resistance
4. Staff attitude toward evaluation

Knows virtually nothing Highly knowledgeable

5. Staff knowledge about evaluation

Cooperative Conflict-laden
6. Program interaction patterns
(administration-staff, staff-staff,

First time ever Seemingly endless

7. Program's prior evaluation experience

High Low
8. Staff and participants education

Homogeneous groups Heterogeneous groups

9. Staff and/or participants'
(pick any 10 you want)

hopelessly naive and intellectually arrogant. ing technocrats to deal with structured prob-
Naive because they believe that problem lems than problem solvers to deal with
solving begins and ends with analysis, and ill-structured ones. (p. 150)
arrogant because they opt for mathematical
rigor over results. They are products of their Narrow technocratic approaches em-
training. Decision science departments ap- phasize following rules and standard op-
pear to have been more effective at train- erating procedures. Creative problem-
Being Active-Reactive-Adaptive • 133

One site Multiple sites

10. Program location

No money to speak of Substantial funding

11. Resources available for

One funding source Multiple funding sources

12. Number of sources of program

Simple and unidimensional Complex and

13. Nature of the program treatment multidimensional

Highly standardized and Highly individualized

routine 14. Standardization of treatment and nonroutine

Horizontal, little hierarchy, Hierarchical, long chain

little stratification 15. Program organizational of command, stratified
decision-making structure

Well-articulated, specifically Ambiguous, broadly

defined 16. Clarity about evaluation defined
purpose and function

Operating information No existing data

system 17. Existing data on program

External Internal
18. Evaluator(s)'relationship to
the program

Voluntary, self-initiated Required, forced on

19. Impetus for the evaluation program

Long time line, open Short time line, fixed

20. Time available for the deadline

solving approaches, in contrast, focus on processed foods of the national supermar-

what works and what makes sense in the ket chains, with the attendant risks of both
situation. Standard methods recipe books greater failure and greater achievement.
aren't ignored. They just aren't taken as Lawrence Lynn's (1980a) ideal policy
the final word. New ingredients are added analyst bears a striking resemblance to my
to fit particular tastes. Homegrown or idea of a creative and responsive utilization-
locally available ingredients replace the focused evaluator.

Individuals really do have to be interdisci- situations or scripts or intuitive sensibilities

plinary; they have to be highly catholic in and understandings about how these situ-
their taste for intellectual concepts and ideas ations will largely unfold. Simon estimates a
and tools. I do not think we are talking so differential repertoire of 50,000 situation
much about acquiring a specific kind of recognitions at the world-class chess level.
knowledge or a specialist's knowledge in There is also some increase in overall long-
order to deal with environmental issues or range strategic planning ability—beginners
energy issues. One does not have to know typically are hard pressed to go beyond one
what a petroleum engineer knows, or what move deep; world-class players often antici-
an air quality engineer knows, or what a pate 3 or sometimes 5 future moves in
chemist knows. Rather, one simply has to be calculating alternative reactions to their
able to ask questions of many disciplines and moves.... One further learning is the capac-
many professions and know how to use the ity to diagnose not just specific game
information. And what that says is, I think, situations but to model or "psyche out" dif-
one has to be intellectually quite versatile. ferent opponents. (Etheredge 1980:243)
It is not enough to be an economist. . . ,
an operations research specialist, [or] a stat- I suggest there's a parallel here to an-
istician. One has to be a little bit of all of ticipating potential use and knowing how
those things. One has to have an intuitive to facilitate it. Etheredge (1980) also
grasp of an awful lot of different intellectual found that experienced players develop
approaches, different intellectual disciplines efficient scanning techniques and the abil-
or traditions so that one can range widely in ity to discard unnecessary information.
doing one's job of crafting a good analysis, They cultivate what seems like an intuitive
so that you are not stuck with just the tools sense but is really a practiced sense of
you know. I think, then, the implication is where to devote attention. You will be
versatility and an intuitive grasp of a fairly hard-pressed, in my view, to find a better
wide range of different kinds of skills and description of evaluation expertise in
approaches, (p. 88; emphasis added) working with intended users. Effective fa-
cilitation involves situation recognition
and responsiveness, anticipation, and the
Learning to Be ability to analyze people—knowing where,
when, and h o w to focus attention. These
Situationally Responsive
are learned and practiced behaviors, a
view I assert w h e n someone suggests that
Expert evaluators are sophisticated at
utilization-focused evaluation only works
situation recognition. Such expertise does
for certain personality types.
not develop overnight, nor is it an outcome
of training. Expertise comes from practice.
Consider expertise in chess as a metaphor
for developing situationally responsive ex- Being Active-Reactive-Adaptive
pertise in evaluation.
In the title of this chapter, I used the
It takes at least IS years of hard work for phrase "active-reactive-adaptive" to sug-
even the most talented individuals to become gest the nature of the consultative interac-
world-class chess masters: what they seem to tions that go on between evaluators and
learn is a repertoire for recognizing types of intended users. The phrase is meant to be
Being Active-Reactive-Adaptive • 135

both descriptive and prescriptive. It de- time after time. They are genuinely im-
scribes how real-world decision making mersed in the challenges of each new set-
actually unfolds. Yet, it is prescriptive in ting and authentically responsive to the
alerting evaluators to consciously and de- intended users of each new evaluation.
liberately act, react, and adapt in order to It is the paradox of decision making that
increase their effectiveness in working with effective action is born of reaction. Only
stakeholders. when organizations and people take in in-
Utilization-focused evaluators are, first formation from the environment and react
of all, active in deliberately and calculat- to changing conditions can they act on that
edly identifying intended users and focus- same environment to reduce uncertainty
ing useful questions. They are reactive in and increase discretionary flexibility (see
listening to intended users and responding Thompson 1967). The same is true for the
to what they learn about the particular individual decision maker or for a problem-
situation in which the evaluation unfolds. solving group. Action emerges through
They are adaptive in altering evaluation reaction and leads to adaptation. The im-
questions and designs in light of their in- agery is familiar: thesis-antithesis-synthesis;
creased understanding of the situation stimulus-response-change.
and changing conditions. Active-reactive- This active-reactive-adaptive stance char-
adaptive evaluators don't impose cook- acterizes all phases of evaluator-user inter-
book designs. They don't do the same thing actions, from initially identifying primary

intended users to focusing relevant ques- fessional experience. All of the techniques
tions, choosing methods, and analyzing re- and ideas presented in this book must be
sults. All phases involve collaborative pro- adapted to the style of the individuals using
cesses of action-reaction-adaptation as them.
evaluators and intended users consider Cousins, Donohue, and Bloom (1996)
their options. The menu of choices includes surveyed North American evaluators to
a broad range of methods, evaluation in- find out what variables correlated with a
gredients from bland to spicy, and a variety collaborative style of practice. Organiza-
of evaluator roles: collaborator, trainer, tional affiliation, gender, and primary job
group facilitator, technician, politician, or- responsibility did not differentiate practice
ganizational analyst, internal colleague, ex- and opinion responses. Canadian evalua-
ternal expert, methodologist, information tors reported greater depth of stakeholder
broker, communicator, change agent, dip- involvement than Americans. Most telling,
lomat, problem solver, and creative con- however, were years and depth of experi-
sultant. The roles played by an evaluator in ence with collaborative approaches. More
any given situation will depend on the experienced evaluators expected more use
evaluation's purpose, the unique constella- of their evaluations and reported a greater
tion of conditions with which the evaluator sense of satisfaction from the collaborative
is faced, and the evaluator's own personal process and greater impacts of the resulting
knowledge, skills, style, values, and ethics. evaluations. In essence, evaluators get bet-
The mandate to be active-reactive- ter at the active-reactive-adaptive process
adaptive in role-playing provokes protest the more they experience it; and the more
from those evaluators and intended users they use it, the more they like it and the
who advocate only one narrow role, more impact they believe it has.
namely, that the evaluator renders judg- Being active-reactive-adaptive explicitly
ment about merit or worth—nothing else recognizes the importance of the indi-
(Stufflebeam 1994; Scriven 1991a). vidual evaluator's experience, orientation,
Clearly, I have a more expansive view of an and contribution by placing the mandate to
evaluator's role possibilities and responsi- be active first in this consulting triangle.
bilities. Keeping in mind that the idea of Situational responsiveness does not mean
multiple evaluator roles is controversial, rolling over and playing dead (or passive)
let's turn to look at what the evaluator in the face of stakeholder interests or per-
brings to the utilization-focused negotiat- ceived needs. Just as the evaluator in utili-
ing table. zation-focused evaluation does not unilat-
erally impose a focus and set of methods on
a program, so, too, the stakeholders are not
Multiple Evaluator Roles set up to impose their initial predilections
and Individual Style unilaterally or dogmatically. Arriving at the
final evaluation design is a negotiated pro-
The evaluator as a person in his or her cess that allows the values and capabilities
own right is a key part of the situational of the evaluator to intermingle with those
mix. Each evaluation will be unique in part of intended users.
because individual evaluators are unique. The utilization-focused evaluator, in be-
Evaluators bring to the negotiating table ing active-reactive-adaptive, is one among
their own style, personal history, and pro- many at the negotiating table. At times
Being Active-Reactive-Adaptive • 137

there may be discord in the negotiating reflect the competing politics and values of
process; at other times harmony. Whatever the setting, (p. 273)
the sounds, and whatever the themes, the
utilization-focused evaluator does not sing She then recommended that evaluators
alone. He or she is part of a choir made up "explicate the politics and values" that un-
of primary intended users. There are solo dergird decisions about purpose, audience,
parts, to be sure, but the climatic theme design, and methods. Her recommenda-
song of utilization-focused evaluation is tion is consistent with utilization-focused
not Frank Sinatra's "I Did It My Way." evaluation.
Rather, it's the full chorus joining in a
unique, situationally specific rendition of
"We Did It Our Way." Respect for Intended Users

One central value that should undergird

User Responsiveness the evaluator's active-reactive-adaptive
and Technical Quality role is respect for all those with a stake in
a program or evaluation. In their seminal
User responsiveness should not mean a article on evaluation use, Davis and Salasin
sacrifice of technical quality. Later chapters (1975) asserted that evaluators were in-
will discuss in detail the utilization-focused volved inevitably in facilitating change and
! approach to ensuring technical quality. A that "any change model should . . . gener-
beginning point is to recognize that stan- ally accommodate rather than manipulate
dards of technical quality vary for different the view of the persons involved" (p. 652).
users and varying situations. The issue is Respectful utilization-focused evaluators
not meeting some absolute research stan- do not use their expertise to intimidate or
dards of technical quality but, rather, mak- manipulate intended users. Egon Guba
ing sure that methods and measures are (1977) has described in powerful language
appropriate to the validity and credibility an archetype that is the antithesis of the
needs of a particular evaluation purpose utilization-focused evaluator:
and specific intended users.
Jennifer Greene (1990) examined in It is my experience that evaluators some-
depth the debate about technical quality times adopt a very supercilious attitude with
versus user responsiveness. She found gen- respect to their clients; their presumptuous-
eral agreement that both are important but ness and arrogance are sometimes over-
disagreement about the relative priority of whelming. We treat the client as a "child-
each. She concluded that the debate is re- like" person who needs to be taken in hand;
ally about how much to recognize and deal as an ignoramus who cannot possibly under-
with evaluation's political inherency: stand the tactics and strategies that we will
bring to bear; as someone who doesn't ap-
Evaluators should recognize that tension and preciate the questions he ought to ask until
conflict in evaluation practice are virtually we tell him—and what we tell him often
inevitable, that the demands imposed by reflects our own biases and interests rather
most if not all definitions of responsiveness than the problems with which the client is
and technical quality (not to mention feasi- actually beset. The phrase "Ugly American"
bility and propriety) will characteristically has emerged in international settings to de-

scribe the person who enters into a new independent, objective, and credible than
culture, immediately knows what is wrong internal evaluators. Internal evaluations
with it, and proceeds to foist his own solu- are suspect because, it is presumed, they
tions onto the locals. In some ways I have can be manipulated more easily by admin-
come to think of evaluators as "Ugly Ameri- istrators to justify decisions or pressured to
cans." And if what we are looking for are present positive findings for public rela-
ways to manipulate clients so that they will tions purposes (House 1986). Of course,
fall in with our wishes and cease to resist our external evaluators who want future evalu-
blandishments, I for one will have none of it. ation contracts are also subject to pressure
(p. 1; emphasis in original) to produce positive findings. In addition,
external evaluators are also typically more
For others who "will have none of it," costly, less knowledgeable about the nu-
there is the alternative of undertaking a ances and particulars of the local situation,
utilization-focused evaluation process and less able to follow through to facilitate
based on mutual respect between evalua- the implementation of recommendations.
tors and intended users. When external evaluators complete their
contract, they may take with them a great
deal of knowledge and insight that is lost
to the program. That knowledge stays "in-
Internal and External Evaluators house" with internal evaluators. External
evaluators have also been known to cause
One of the most fundamental issues in difficulties in a program through insensitiv-
considering the role of the evaluator is the ity to organizational relationships and
location of the evaluator inside or outside norms, one of the reasons the work of
the program and organization being evalu- external evaluators is sometimes called
ated, what has sometimes been called the "outhouse" work.
"in-house" versus "outhouse" issue. The
early evaluation literature was aimed pri- One of the major trends in evaluation
marily at external evaluators, typically during the 1980s was a transition from
researchers who conducted evaluations un- external to internal evaluation, with Cana-
der contract to funders. External evalua- dian Arnold Love (1991, 1983) document-
tors come from universities, consulting ing and contributing to the development of
firms, and research organizations or work internal evaluation. At the beginning of the
as independent consultants. The defining 1970s evaluation was just emerging as a
characteristic of external evaluators is that profession. There were fewer distinct
they have no long-term, ongoing position evaluation units within government bu-
within the program or organization being reaus, human service agencies, and private
evaluated. They are therefore not subordi- sector organizations than there are now.
nated to someone in the organization and School districts had research and evalu-
not directly dependent on the organization ation units, but even they contracted out
for their job and career. much of the evaluation work mandated
External evaluators are valuable pre- by the landmark 1965 Elementary and Sec-
cisely because they are outside the organi- ondary Education Act in the United States.
zation. It is typically assumed that their As evaluation became more pervasive in the
external status permits them to be more 1970s, as the mandate for evaluation was
Being Active-Reactive-Adaptive • 139

added to more and more legislation, and as participation in the evaluation process.
training for evaluators became more avail- One internal evaluator commented,
able and widespread, internal evaluation
units became more common. Now, most
federal, state, and local agencies have in- My director told me he doesn't want to
ternal evaluation units; international or- spend time thinking about evaluation. That's
ganizations also have internal evaluation why he hired me. He wants me to "anticipate
divisions; and it is clear that "internal his information needs." I've had to find ways
evaluators can produce evaluations of high to talk with him about his interests and in-
quality that meet rigorous standards of formation needs without explicitly telling
objectivity while still performing useful him he's helping me focus the evaluation. I
service to administrators if they have pre- guess you could say I kind of involve him
viously established an image of an inde- without his really knowing he's involved.
pendent but active voice in the organiza-
tional structure" (Sonnichsen 1987: 2. Internal evaluators are often asked
34-35). by superiors for public relations informa-
Over the years, I have had extensive tion rather than evaluation. The internal
contact with internal evaluators through evaluator may be told, "I want a report for
training and consulting, working closely the legislature proving our program is ef-
with several of them to design internal fective." It takes clear conviction, subtle
monitoring and evaluation systems. For the diplomacy, and an astute understanding of
second edition of this book, I interviewed how to help superiors appreciate evalu-
10 internal evaluators who I knew used a ation to keep internal evaluation responsi-
utilization-focused approach. Their com- bilities from degenerating into public rela-
ments about how they have applied utiliza- tions. One mechanism used by several
tion-focused principles offer insights into internal evaluators to increase support for
the world of the internal evaluator and real evaluation rather than public relations
illuminate research findings about effec- is establishing an evaluation advisory com-
tive approaches to internal evaluation mittee, including influential people from
(Winberg 1991; Lyon 1989; Huberty outside the organization, to provide inde-
1988; Kennedy 1983). pendent checks on the integrity of internal

Themes From Internal Evaluators 3. Internal evaluators get asked to do

lots of little data-gathering and report-
1. Actively involving stakeholders writing tasks that are quite time consuming
within the organization can be difficult but too minor to be considered meaningful
because evaluation is often perceived by evaluation. For example, if someone in the
both superiors and subordinates as the job agency wants a quick review of what other
of the evaluator. The internal evaluator is states are doing about some problem, the
typically expected to do evaluations, not internal evaluator is an easy target for the
facilitate an evaluation process involving task. Such assignments can become so per-
others. Internal evaluators who have had vasive that it's difficult to have time for
success involving others have had to work longer-term, more meaningful evaluation
hard at finding special incentives to attract efforts.

4. Internal evaluators are often ex- evaluation and some of them didn't like it at
cluded from major decisions or so far re- all, although a couple of the good ones said
moved from critical information networks they were glad I reminded them.
that they d o n ' t k n o w about new initiatives
or developments in time to build in an Another internal evaluator told me he had
evaluation perspective up front. O n e inter- learned h o w to follow u p informally. He
nal evaluator explained, has seven years' experience as an internal
h u m a n services evaluator. H e said,
We have separate people doing planning and
evaluation. I'm not included in the planning At first I just wrote a report and figured my
process and usually don't even see the plan job was done. Now, I tell them when we
until it's approved. Then they expect me to review the initial report that I'll check back
add on an evaluation. It's a real bitch to take in a few months to see how things are going.
a plan done without any thought of evalu- I find I have to keep pushing, keep remind-
ation and add an evaluation without ing, or they get busy and just file the report.
screwing up or changing the plan. They think We're gradually getting some understanding
evaluation is something you do at the end that our job should include some follow-up.
rather than think about from the start. It's Mostly it's on a few things that we decide are
damn hard to break through these percep- really important. You can't do it all.
tions. Besides, I don't want to do the
planners' job, and they don't want to do my
job, but we've got to find better ways of Internal Role Definitions
making the whole thing work together.
That's my frustration. . . . It takes me con- The themes from internal evaluators in-
stantly bugging them, and sometimes they dicate the importance of carefully defining
think I'm encroaching on their turf. Some the job to include attention to use. W h e n
days I think, "Who needs the hassle?" even and if the internal evaluation job is defined
though I know it's not as useful just to tack primarily as writing a report and filling out
on the evaluation at the end. routine reporting forms, the ability of the
evaluator to influence use is quite limited.
5. Getting evaluation used takes a lot of W h e n and if the internal evaluator is or-
follow-through. O n e internal evaluator ex- ganizationally separated from managers
plained that her job was defined as data and planners, it is difficult to establish col-
gathering and report writing without con- laborative arrangements that facilitate use.
sideration of following up to see if report Thus, a utilization-focused approach to in-
recommendations were adopted. That's ternal evaluation will often require a re-
not part of her job description, and it takes definition of the position to include re-
time and some authority. She commented, sponsibility for working with intended
users to develop strategies for acting on
How do I get managers to use a report if my findings.
job is just to write the report? But they're O n e of the most effective internal evalu-
above me. I don't have the authority to ask ation units I've encountered was in the U.S.
them in six months what they've done. I Federal Bureau of Investigation (FBI). This
wrote a follow-up memo once reminding unit reported directly to the bureau's dep-
managers about recommendations in an uty director. The evaluation unit director
Being Active-Reactive-Adaptive • 141

had direct access to the director of the FBI becomes a tool for change and a vehicle for
in both problem identification and discus- evaluators to influence the organization,
sion of findings. The purpose of the unit (p. 141; emphasis in original)
was program improvement. Reports were
written only for internal use; there was no An evaluation advocate is not a cheer-
public relations use of reports. Public rela- leader for the program, but rather, a
tions was the function of a different unit. champion of evaluation use. This is some-
The internal evaluation staff was drawn times called persuasive use in which
from experienced FBI agents. They thus "advocates work actively in the politics of
had high credibility with agents in the field. the organization to get results used"
They also had the status and authority of (Caracelli and Preskill 1996).
the director's office behind them. The
evaluation unit had an operations hand-
book that clearly delineated responsibilities The new evaluator is a program advocate—
and procedures. Evaluation proposals and not an advocate in the sense of an ideologue
designs were planned and reviewed with willing to manipulate data and to alter find-
intended users. Multiple methods were ings to secure next year's funding. The new
used. Reports were written with use in evaluator is someone who believes in and is
mind. Six months after the report had been interested in helping programs and organiza-
written and reviewed, follow-up was for- tions succeed. At times the program advo-
mally undertaken to find out if recommen- cate evaluator will play the traditional critic
dations had been implemented. The inter- role: challenging basic program assump-
nal evaluators had a strong commitment to tions, reporting lackluster performance, or
improving FBI programs and clear author- identifying inefficiencies. The difference,
ity to plan, conduct, and report evaluations however, is that criticism is not the end of
in ways that would have an impact on the performance-oriented evaluation; rather, it
organization, including follow-up to make is part of a larger process of program and
sure recommendations approved by the di- organizational improvement, a process that
rector were actually implemented. receives as much of the evaluator's atten-
Based on his experience directing the tion and talents as the criticism function.
FBI's internal evaluation unit, DickSon- (Bellavita et al. 1986:289)
nichsen (1988) formulated what he has
called internal advocacy evaluation as a The roles of champion, advocate, and
style of organizational development: change agent (Sonnichsen 1994) are just
some of the many roles open to internal
Internal evaluators have to view themselves evaluators. Love (1991) has identified a
as change agents and participants in policy number of both successful and unsuccess-
formulation, migrating from the traditional ful roles for internal evaluators (see Ex-
position of neutrality to an activist role in the hibit 6.2). Carefully defining the role of
organizational decision-making process. The internal evaluator is a key to effective and
practice of Advocacy Evaluation positions credible internal evaluation use.
internal evaluators to become active partici- Part of defining the role is labeling it in
pants in developing and implementing a meaningful way. Consider this reflection
organizational improvements. Operating from internal school district evaluator
under an advocacy philosophy, evaluation Nancy Law (1996), whose office was re-

Successful and Unsuccessful Roles of Internal Evaluators

Successful Roles Unsuccessful Roles

Management consultant Spy

Decision support Hatchet carrier
Management information resource Fear-inspiring dragon
Systems generalist Number cruncher
Expert troubleshooter Organizational conscience
Advocate for/champion of evaluation use Organizational memory
Systematic planner Public relations officer

SOURCE: Adapted and expanded from Love 1991:9.

named from Research and Evaluation to differences along the lines of the preceding
Accountability Department. discussion, I like to point out that the ques-
tion is loaded by implying that internal and
This title change has become meaningful for external approaches are mutually exclu-
me personally. It was easy to adopt the name, sive. Actually, there are a good many possi-
but harder to live up to it. . . . ble combinations of internal and external
Now, I am learning that / am the one evaluations that may be more desirable and
accountable—that my job doesn't end when more cost-effective than either a purely
the report is finished and presented. My role internal or purely external evaluation.
continues as I work with others outside re- Accreditation processes are a good ex-
search to create changes that I have recom- ample of an internal-external combination.
mended. Oh, yes, we still perform the types The internal group collects the data and
of research/evaluation tasks done previously, arranges them so that the external group
but there is a greater task still to be done— can come in, inspect the data collected by
that of convincing those who need to make the internal group, sometimes collect addi-
changes to move ahead! Put simply, when we tional information on their own, and pass
took a different name, we became something judgment on the program.
different and better, (p. 1) There are many ways in which an evalu-
ation can be set up so that some external
group of respected professionals and
Internal-External evaluators guarantees the validity and fair-
Evaluation Combinations ness of the evaluation process while the
people internal to the program actually
In workshops, I am often asked to com- collect and/or analyze the evaluation data.
pare the relative advantages and disadvan- T h e cost savings of such an approach can
tages of internal versus external evalu- be substantial while still allowing the evalu-
ations. After describing some of the ation to have basic credibility and legiti-
Being Active-Reactive-Adaptive • 143

macy through the blessing of the external When orchestrating an internal-external

review committee. combination, one danger to watch for is
I worked for several years with one of that the external group may impose un-
the leading chemical dependency treat- manageable and overwhelming data collec-
ment centers in the country, the Hazelden tion procedures on the internal people. I
Foundation of Minnesota. The foundation saw this happen in an internal-external
has established a rigorous evaluation pro- model with a group of school districts in
cess that involves data collection at the Canada. The external committee set as the
point of entry into the program and then standard doing "comprehensive" data col-
follow-up questionnaires 6 months, 12 lection at the local school level, including
months, and 24 months after leaving the data on learning outcomes, staff morale,
program. Hazelden's own research and facilities, curriculum, the school lunch pro-
evaluation department collects all of the gram, the library, parent reactions, the per-
data. My responsibility as an external ceptions of local businesspeople, analy-
evaluator was to monitor that data collec- sis of the school bus system, and so on.
tion periodically to make sure that the es- After listening to all of the things the exter-
tablished procedures were being followed nal committee thought should be done, the
correctly. I then worked with the program internal folks dubbed it the Internal-
Extemal-Eternal model of evaluation.
decision makers to identify the kind of data
analysis that was desirable. They per- The point is that a variety of internal-
formed the data analysis with their own external combinations are possible to com-
computer resources. They sent the data to bine the lower costs of internal data col-
me, and I wrote the annual evaluation re- lection with the higher credibility of exter-
port. They participated in analyzing, inter- nal review. In working out the details of
preting, and making judgments about the internal-external combinations, care will
data, but for purposes of legitimacy and need to be taken to achieve an appropriate
credibility, the actual writing of the final and mutually rewarding balance based on
report was done by me. a collaborative commitment to the stan-
This internal/external combination is dards of utility, feasibility, propriety, and
I sometimes extended one step further by accuracy.
having still another layer of external pro-
fessionals and evaluators pass judgment on
the quality and accuracy of the evaluation Evaluation as Results-
final report through a meta-evaluation Oriented Leadership
process—evaluating the evaluation based
on the profession's standards and princi- Most writings about internal evaluation
ples. Indeed, the revised standards for assume a separate unit or specialized posi-
evaluation (Joint Committee 1994:A12) tion with responsibility to conduct evalu-
prescribe meta-evaluation so that stake- ations. An important new direction in
holders have an independent credible re- evaluation is to treat evaluation as a lead-
view of an evaluation's strengths and weak- ership function of all managers and pro-
nesses. Such an effort will be most gram directors in the organization. The
meaningful and cost-beneficial for large- person responsible for internal evaluation
scale summative evaluations of major pol- then plays a facilitative, resource, and train-
icy importance. ing function in support of managers rather

Four Functions of Results-Oriented,
Reality-Testing Leadership

• Create and nurture a results-oriented, reality-testing culture.

• Lead in deciding what outcomes to commit to and hold yourselves accountable for.
• Make measurement of outcomes thoughtful, meaningful, and credible.
• Use the results—and model for others serious use of results.

than spending time actually conducting ation for decision making and budgeting;
evaluations. The best example of this ap- and (3) because of the prior two commit-
proach I've worked with and observed up ments, a person of great competence and
close was the position of Associate Admin- dedication was selected to fill the Associate
istrator for Performance Measurement and Administrator Performance Measurement
Evaluation in Hennepin County, Minne- and Evaluation position, after a national
sota (Minneapolis). The county had no in- search.
ternal evaluation office. Rather, this senior These patterns of effectiveness stand out
position, as part of the County Executive because so often internal evaluation is dele-
team, had responsibility to infuse evalu- gated to the lowest level in an organization
ation systems throughout the county, in and treated as a clerical function. Indeed,
every department and program. The being given an evaluation assignment is
framework called for a results-oriented ap- often a form of punishment agency direc-
proach that was "thoughtful, useful, and tors use, or a way of giving deadwood staff
credible" (Knapp 1995). Every manager in something meaningless to occupy them-
the county received training in how to build selves with. It is clear that, for internal
outcomes evaluation into ongoing program evaluators to be useful and credible, they
processes. Performance measurement was must have high status in the organiza-
tied to reporting and budgeting systems. tion and real power to make evaluation
What made this approach to internal evalu- meaningful.
ation work, in my judgment, was three- Elevating the status of evaluation in this
fold: (1) Results-oriented performance way is most likely to occur when evaluation
measurement was defined as a leadership is conceived of as a leadership function
function of every county manager, not just rather than a low-level clerical or data man-
a reporting function to be delegated as far agement task. Exhibit 6.3 presents the four
down as possible in the department; (2) the functions of results-oriented leadership. In
overall responsibility for evaluation resided this framework, evaluation becomes a sen-
at the highest level of the organization, in ior management responsibility focused on
the executive team, with direct access to the decision-oriented use rather than a data-
County Board of Commissioners backed collection task focused on routine internal
up by public commitments to use evalu- reporting.
Being Active-Reactive-Adaptive • 145

There is a downside to elevating the focused evaluation is turnover of primary

status and visibility of internal evaluators: intended users.
They become more politically vulnerable.
To complete the example cited above, when
the county administrator of Hennepin Going Bananas
County departed, the Associate Adminis-
trator for Performance Measurement and Before closing this chapter, it seems ap-
Evaluation position was terminated. Politi- propriate to provide a situation for situ-
cally, the position was dependent on the ational analysis, a bit of a practice exercise,
county's chief executive officer. When that if you will. Consider, then, the evaluation
person left, the internal evaluation position relevance of the following story repeated
became expendable as part of the sub- from generation to generation by school-
sequent political shakeout. As Chapter 15 children.
will discuss, the Achilles' heel of utilization-

A nun u'Jlking along the street notices another man on the other side with bananas
in /'/> VMS. I Ic shouts, "Hey, mister, why do you have bananas in your ears?" Receiving
'in ti-'l'unsv. '>e pursues the man, calling again as he approaches, "Pardon me, but why
have you got bananas in your ears?" Again there is no response.
He catches up to the man, puts his hand on his shoulder, and says "Do you realize
you have bananas in your ears?"
The gentleman in question stops, looks puzzled, takes the bananas out of his ears,
and says, "I'm sorry, what did you ask? I couldn't hear you because I have bananas in
my ears."

Now for the situational analysis. How 2. If the banana man is an evaluator, and the
might you use this story with a group of man in pursuit is a stakeholder
intended users to make a point about 3. If both are primary stakeholders and the
the nature of evaluation? What point(s) evaluator observes this scene
could the story be used to illustrate (meta- 4. Both are evaluators observed by a stake-
phorically)? holder
What are the implications of the story for
evaluation under four different conditions: It is just such situational variations that
make strategic, contingency thinking and
1. If the man with the bananas in his ears is a evaluator responsiveness so important —
stakeholder and the man in pursuit is an and so challenging.

Beyond the Goals Clarification Game

Focusing on Outcomes

M ulla Nasrudin was a Sufi guru. A king who enjoyed Nasrudin's company,
and also liked to hunt, commanded him to accompany him on a bear hunt.
Nasruain was terrified.
When Nasrudin returned to his village, someone asked him: "How did the Hunt go?"
"How many bears did you see?"
"How could it have gone marvelously, then?"
"When you are hunting bears, and you are me, seeing no bears at all is a marvelous
—Shah 1964:61

Evaluation of the Bear Project

If this tale were updated by means of an evaluation report, it might read something like

Under the auspices of His Majesty's Ministry of the Interior, Department of Natural
Resources, Section on Hunting, Office of Bears, field observers studied the relationship
between the number of bears sighted on a hunt and the number of bears shot on a hunt.
Having hypothesized a direct, linear relationship between the sighting of bears and


killing of bears, data were collected on a recent royal hunting expedition. The small
sample size limitsgeneralizability, but the results support the hypothesis at the 0.001
level of statistical significance. Indeed, the correlation is perfect. The number of bears
sighted was zero and the number killed was zero. In no case was a bear killed without
first being sighted. We therefore recommend new Royal regulations requiring that bears
first be sighted before they are killed.
Respectfully submitted.
The Incomparable Mulla Nasrudin
Royal Evaluator

Whose Goals Will Be Evaluated? and the king, or reducing Nasrudin's fear
of bears, or an increase in the king's power
Although Nasrudin's evaluation bares over Nasrudin. It may even be possible
certain flaws, it shares one major trait with (likely!) that different participants in the
almost all other reports of this genre: hunt had different goals. Nasrudin per-
Namely, it is impossible to tell whether it ceived a "marvelous" outcome. Other
answers anyone's question. Who decided stakeholders, with different goals, might
that the goal evaluated should be the num- have concluded otherwise.
ber of bears killed? Perhaps the hunt's pur- In utilization-focused evaluation, the
pose was a heightened sensitivity to nature, primary intended users determine whose
or a closer relationship between Nasrudin goals will be evaluated if they decide that
Beyond the Goals Clarification Game • 149

evaluating goal attainment will be the focus bling program staff to identify and clarify
of the evaluation. There are other ways of program goals and objectives. If evalua-
focusing an evaluation, as we'll see, but tors are second only to tax collectors in
first, let's review the traditional centrality the hearts of program staff, I suspect that
of goal attainment in evaluation. it is not because staff fear evaluators' judg-
ments about program success, but because
they hate constant questioning about
The Centrality of goals.
Goals in Evaluation

Traditionally, evaluation has been syn-

onymous with measuring goal attainment The Goals Clarification Game
(Morris and Fitz-Gibbon 1978). Peter Rossi
(1972) has stated that "a social welfare Evaluators frequently conduct goals
program (or for that matter any program) clarification meetings like the Twenty
which does not have clearly specified goals Questions game played at parties. Someone
cannot be evaluated without specifying thinks of an object in the room and then the
some measurable goals. This statement is players are allowed 20 questions to guess
obvious enough to be a truism" (p. 18). In what it is. In the goals clarification game,
a major review of the evaluation literature the evaluator has an object in mind (a clear,
in education, Worthen and Sanders (1973) specific, and measurable goal). Program
concluded that "if evaluators agree in any- staff are the players. The game begins with
thing, it is that program objectives written the staff generating some statement they
in unambiguous terms are useful informa- think is a goal. The evaluator scrutinizes the
tion for any evaluation study" (p. 231). statement for clarity, specificity, and mea-
Carol Weiss (1972b) observed that surability, usually judging the staff's effort
inadequate. This process is repeated in suc-
the traditional formulation of the evaluation cessive tries until the game ends in one of
question is: To what extent is the program three ways: (1) The staff gives up (so the
succeeding in reaching its goals? . . . The evaluator wins and writes the program
goal must be clear so that the evaluator goals for staff); (2) the evaluator gives up
knows what to look for. . . . Thus begins the (so the staff gets by with vague, fuzzy, and
long, often painful process of getting people unmeasurable goals); or (3) in rare cases,
to state goals in terms that are clear, specific, the game ends when staff actually stumbles
and measurable, (pp. 74-76; emphasis in on a statement that reasonably approxi-
original) mates what the evaluator had in mind.
Why do program staff typically hate this
As the preceding quotes illustrate, the game so much?
evaluation literature is replete with seri-
ous treatises on the centrality of program
goals, and this solemnity seems to carry 1. They have played the game hundreds of
over into evaluators' work with program times, not just for evaluators, but for funders
staff. There may be no more deadly way and advisory boards, in writing proposals,
to begin an evaluation effort than assem- and even among themselves.

2. They have learned that, when playing the gressional audiences, showing that Head
game with an evaluator, the evaluator almost Start's health, nutrition, resource redistri-
always wins. bution, cultural, and community goals
3. They come out of the game knowing that ought to have been in the spotlight (see
they appear fuzzy-minded and inept to the Evans 1971:402; Williams and Evans
evaluator. 1969). Thus, despite negative evaluation
4. It is a boring game. findings, Congress expanded the Head
5. It is an endless game because each new evalu- Start program, and the evaluators were
ator comes to the game with a different thrown on the defensive. (It was about this
object in mind. (Clarity, specificity, and same time that serious concerns over
measurability are not clear, specific, and nonuse of evaluation findings started to be
measurable criteria, so each evaluator can heard on a national scale.)
apply a different set of rules in the game.)

Among experienced program staff, Conflict Over Goals

evaluators may run into countering strate- and the Delphi Counter
gies like the goals clarification shuffle.
Like many dance steps (e.g., the Harlem Not all goals clarification exercises re-
shuffle, the hustle) this technique has the semble dances. Often, the more fitting
most grace and style when executed si- metaphor is war. Conflict over program
multaneously by a group. The goals clari- goals among different stakeholder groups
fication shuffle involves a sudden change is common. For example, in criminal justice
in goals and priorities after the evaluator programs, battles are waged over whether
has developed measuring instruments and the purpose of a program is punitive (pun-
a research design. The choreography is ish criminal offenders for wrongdoing),
dazzling. The top-priority program goal is custodial (keep criminal offenders off the
moved two spaces to either the right or streets), or rehabilitative (return offenders
left and four spaces backward. Concur- to society after treatment). In education
rently, all other goals are shuffled with and training programs, conflicts often
style and subtlety, the only stipulation emerge over whether the priority goal is
being that the first goal end up somewhere attitude change or behavior change. In wel-
in the middle, with other goals reordered fare agencies, disagreements can be found
by new criteria. over whether the primary purpose is to get
The goals clarification shuffle first came clients off welfare or out of poverty, and
into national prominence in 1969 when it whether the focus should be long-term
was employed as a daring counterthrust to change or short-term crisis intervention
the Westinghouse-Ohio State University (Conte 1996). In health settings, staff dis-
Head Start Evaluation. That study evalu- sension may emerge over the relative em-
ated cognitive and affective outcomes of phasis to be placed on preventive versus
the Head Start Program and concluded that curative medical practice. Chemical de-
Head Start was largely ineffective (Cicarelli pendency programs are often enmeshed in
1971; Westinghouse Learning Corporation controversy over whether the desired out-
1969). However, as soon as the final report come is sobriety or responsible use. Even
was published, the goals clarification shuf- police and fire departments can get caught
fle was executed before enthusiastic Con- in controversy about the purposes and ac-
Beyond the Goals Clarification Game • 151

tual effects of sirens, with critics arguing analysis—competing opinions apparently

that they're more a nuisance than a help converge and synthesize when this technique
(Perlman 1996). Virtually any time a group is used. (Rosenthal 1976:121)
of people assemble to determine program
goals, conflict can emerge, resulting in T h e trick to managing conflict with this
a lengthy, frustrating, and inconclusive technique is that the stakeholders never
meeting. meet face to face. T h u s , disagreements
For inexperienced evaluators, conflicts and arguments never get a chance to sur-
among stakeholders can be unnerving. face on an interpersonal level. Individual
Once, early in my career, a goals clarifica- responses remain confidential.
tion session erupted into physical violence
between a school board member and the The technique has proved so successful in
district's internal evaluator. The novice producing consensus . . . it is now often
evaluator can lose credibility by joining one adopted in many kinds of situations where
side or the other. M o r e experienced eval- convergence of opinion is advisable or desir-
uators have learned to remain calm and able . . . avoiding as it does the sundry prima
neutral, sometimes suggesting that multi- donna behaviors that may vitiate roundtable
ple goals be evaluated, thereby finessing discussions. (Rosenthal 1976:121-22)
the need for consensus about program
priorities. T h e strength of the Delphi a p p r o a c h —
A more elaborate counter to goals con- lack of face-to-face interaction—is also its
flict is the use of some kind of formal weakness. T h e process fails to deal with
ranking approach such as the D^lphTIech- real stakeholder power issues and diver-
nique (Dalkey 1 9 6 9 ; Helmer 1966), espe- gent interests. If those issues aren't dealt
cially where there are large numbers of with early in the evaluation, they will
stakeholders and many possible priorities. likely resurface later and threaten the
evaluation's credibility and utility.
The Delphi technique, a method of develop- In some instances, an evaluator may en-
ing and improving group consensus, was counter open warfare over goals and val-
originally used at the Rand Corporation to ues. A "goals w a r " usually occurs when two
arrive at reliable prediction about the future or more strong coalitions are locked in
of technology; hence its oracular name. . . . battle to determine which group will con-
Delphi essentially refers to a series of inten- trol the future direction of some public
sive interrogations of samples of individuals policy or program. Such wars involve
(most frequently, experts) by means of highly emotional issues and deeply held
mailed questionnaires concerning some im- values, such as conflicting views on abor-
portant problem or question; the mailings tion or sex education for teenagers.
are interspersed with controlled feedback to Evaluation of school busing programs to
the participants. The responses in each achieve racial balance offers an example
round of questioning are gathered by an rich with conflict. By what criteria ought
intermediary, who summarizes and returns busing programs be evaluated? Changed
the information to each participant, who racial attitudes? Changed interracial behav-
may then revise his own opinions and rat- iors? Improved student achievement? De-
ings. . . . However antagonistic the initial gree of parent involvement? Access to edu-
positions and complex the questions under cational resources? All are candidates for

the honor of primary program goal. Is surprise if they think that their primary task
school busing supposed to achieve desegre- will be formulating an evaluation design for
gation (representative proportions of mi- already established goals. Even where goals
nority students in all schools) or integration exist, they are frequently unrealistic, hav-
(positive interracial attitudes, cooperation, ing been exaggerated to secure funding.
and interaction)? Many communities, One reason evaluability assessment has be-
school boards, and school staffs are in open come an important preevaluation tool is
warfare over these issues. Central to the that, by helping programs get ready for
battles fought are basic disagreements evaluation, it acknowledges the common
about what evaluation criteria to apply (see need for a period of time to work with
Cohen and Weiss 1977; Cohen and Garet program staff, administrators, funders, and
1975). participants on clarifying goals—making
them realistic, meaningful, agreed on, and
evaluable (Wholey 1994; Smith 1989).
Evaluability Assessment Evaluability assessment often includes
and Goals Clarification fieldwork and interviews to determine how
much consensus there is among various
Evaluators have gotten heavily involved stakeholders about goals and to identify
in goals clarification because, when we are where differences lie. Based on this kind of
invited in, we seldom find a statement of contextual analysis, an evaluator can work
clear, specific, prioritized, and measurable with primary intended users to plan a strat-
goals. This can take novice evaluators by egy for goals clarification.
Beyond the Goals Clarification Game • 153

When an evaluability assessment reveals From a utilization-focused point of view,

broad aims and fuzzy goals, it's important the challenge is to calculate h o w early in-
to understand what role goals are under- teractions in the evaluation process will
stood to play in the program. Fuzzy goals affect later use. Typically, it's not useful to
actually characterize much human cogni- ignore goals conflict, accept poorly formu-
tion and reasoning (Zadeh et al. 1975:ix). lated or unrealistic goals, or let the evalu-
Laboratory experiments suggest that fuzzy ator assume responsibility for writing clear,
conceptualizing may be typical of half the specific, and measurable goals. Primary in-
population (Kochen 1975:407). N o won- tended users need to be involved in assess-
der evaluators have so much trouble get- ing h o w much effort to put into goals clari-
ting clear, specific, and measurable goals! fication. In doing so, both evaluators and
Carol Weiss (1972b) has commented in this primary intended users do well to heed the
regard: evaluation standard on political viability:

Part of the explanation [for fuzzy goals] The evaluation should be planned and con-
probably lies in practitioners' concentration ducted with anticipation of the different
on concrete matters of program functioning positions of various interest groups, so that
and their pragmatic mode of operation. They their cooperation may be obtained, and so
often have an intuitive rather than an analyti- that possible attempts by any of these groups
cal approach to program development. But to curtail evaluation operations or to bias or
there is also a sense in which ambiguity serves misapply the results can be averted or coun-
a useful function; it may mask underlying teracted. (Joint Committee on Standards
divergences in intent. . . glittering generali- 1994:F2)
ties that pass for goal statements are meant
to satisfy a variety of interests and perspec- There are alternatives to goals-based
tives, (p. 27) evaluation, alternatives we'll consider in
the next chapter. First, let's examine h o w
to work with intended users w h o w a n t to
Thus, evaluators have to figure out if ad-
focus on goals and results.
ministrators and staff are genuinely fuzzy
about what they're attempting to accom-
plish, or if they're simply being shrewd in
not letting the evaluator (or others) dis- Communicating About
cover their real goals, or if they're trying to Goals and Results
avoid conflict through vagueness.
Fuzzy goals, then, may be a conscious Part of the difficulty, I am convinced, is
strategy for avoiding an outbreak of goals the terminology: goals and objectives.
wars among competing or conflicting inter- These very words can intimidate staff.
ests. In such instances, the evaluation may Goals and objectives have become daunting
be focused on important questions, issues, weights that program staff feel around their
and concerns without resort to clear, spe- necks, burdening them, slowing their ef-
cific, and measurable objectives. However, forts, and impeding rather than advancing
more often than not in my experience, the their progress. Helping staff clarify their
difficulty turns out to be a conceptual prob- purpose and direction may mean avoiding
lem rather than deviousness. use of the term goals and objectives.

I've found program staff quite animated Focusing on Outcomes and Results
and responsive to the following kinds of
questions: What are you trying to achieve In the minds of many program people,
with your clients? If you are successful, from board members to front-line staff and
how will your clients be different after the participants, goals are abstract statements
program than they were before? What of ideals written to secure funding—meant
kinds of changes do you want to see in your to inspire, but never achieved. Consider
clients? When your program works as you this poster on the wall of the office of a
want it to, how do clients behave differ- program I evaluated: The greatest danger
ently? What do they say differently? What is not that we aim too high and miss, but
would I see in them that would tell me they that our goal is too low and we attain it.
are different? Program staff can often pro- For the director of this program, goals were
vide quite specific answers to these ques- something you put in proposals and plans,
tions, answers that reveal their caring and and hung on the wall, then went about your
involvement with the client change pro- business.
cess, yet when the same staff are asked to Let me illustrate the difference between
specify their goals and objectives, they traditional program goals and a focus on
freeze. participant outcomes with plans submitted
by county units to the Minnesota Depart-
After querying staff about what results
ment of Human Services (MDHS). 1 The
they hope to accomplish with program par-
plans required statements of outcomes.
ticipants, I may then tell them that what
Each statement below promises something,
they have been telling me constitutes their
but that something is not a change in client
goals and objectives. This revelation often
functioning, status, or well-being. These
brings considerable surprise. They often
statements reveal how people in social ser-
react by saying, "But we haven't said any-
vices have been trained to think about pro-
thing about what we would count." This,
gram goals. My comments, following each
as clearly as anything, I take as evidence of
goal, are meant to illustrate how to help
how widespread the confusion is between
program leaders and other intended evalu-
the conceptualization of goals and their
ation users reframe traditional goals to fo-
measurement. Help program staff and cus on participant outcomes.
other intended users be realistic and con-
crete about goals and objectives, but don't
make them hide what they are really trying
Problematic Outcome
to do because they're not sure how to write
a formally acceptable statement of goals
and objectives, or because they don't know
what measurement instruments might be 1. To continue implementation of a case man-
available to get at some of the important agement system to maintain continued con-
things they are trying to do. Instead, take tact with clients before, during, and after
them through a process that focuses on treatment.
achieving outcomes and results rather than Comment: Continued implementation of
writing goals. The difference, it turns out, the system is the goal. And what is promised
can be huge. for the client? "Continued contact."
Beyond the Goals Clarification Game • 155

2. Case management services will be available 7. County clients will receive services which
to all persons with serious and persistent they value as appropriate to their needs and
mental illness who require them. helpful in remediating their concerns.

Comment: This statement aims at avail-

Comment: Client satisfaction can be an
ability—a service delivery improvement.
important outcome, but it's rarely suffi-
Easily accessible services could be available
cient by itself. Especially in tax-supported
24 hours a day, but with what outcomes?
programs, taxpayers and policymakers
want more than happy clients. They want
3. To develop needed services for chronically
clients to have jobs, be productive, stay
chemically dependent clients.
sober, parent effectively, and so on. Client
Comment: This statement focuses on pro- satisfaction needs to be connected to other
gram services rather than the client out- desired outcomes.
comes. My review of county plans revealed
that most managers focus planning at the 8. Improve ability of adults with severe and
program delivery level, that is, the program's persistent mental illness to obtain employ-
goals, rather than how clients' lives will be ment.
Comment: Some clients remain for
4. To develop a responsive, comprehensive cri- years in programs that enhance their ability
sis intervention plan. to obtain employment—without ever
getting a job.
Comment: A plan is the intended out-
come. I found that many service provid- 9. Adults with serious and persistent mental
ers confuse planning with getting something illness will engage in a process to function
done. The characteristics of the plan— effectively in the community.
"responsive, comprehensive"— reveal noth-
ing about results for clients. Comment: Engaging in the process is as
much as this aims for, in contrast to cli-
5. Develop a supportive, family-centered, em- ents actually functioning effectively in the
powering, capacity-building intervention community.
system for families and children.
10. Adults with developmental disabilities will
Comment: This goal statement has all the
participate in programs to begin making
latest human services jargon, but, carefully
decisions and exercising choice.
examined, the statement doesn't commit to
empowering any families or actually enhanc-
Comment: Program participation is the
ing the capacity of any clients.
stated focus. This leads to counting how
many people show up rather than how
6. Expand placement alternatives.
many make meaningful decisions and exer-
Comment: More alternatives is the in- cise real choice. A client can participate
tended result, but to what end? Here is in a program aimed at teaching decision-
another system-level goal that carries the making skills, and can even learn those
danger of making placement an end in itself skills, yet never be permitted to make real
rather than a means to client improvement. decisions.

11. Each developmentally disabled consumer function effectively. If that outcome is at-
(or their substitute decision maker) will tained, they won't need hospitalizations.
identify ways to assist them to remain
connected, maintain, or develop natural 15. Improve quality of child protection inter-
supports. vention services.

Comment: This goal is satisfied, as writ-

ten, if each client has a list of potential Comment: I found a lot of outcome state-
connections. The provider, of course, can ments aimed at enhancing quality. Ironi-
pretty much guarantee composition of such cally, quality can be enhanced by improving
services without having an impact on client
a list. The real outcome: Clients who are
outcomes. Licensing and accrediting stan-
connected to a support group of people.
dards often focus on staff qualifications and
site characteristics (indicators of quality),
12. Adults in training and rehab will be in-
but seldom require review of what program
volved in an average of 120 hours of com-
participants achieve.
munity integration activities per quarter.

Comment: Quantitative and specific, but T h e point of reviewing these examples

the outcome stated goes only as far as being has been to show the kinds of goal state-
involved in activities, not actually being ments an evaluator may encounter w h e n
integrated into the community. beginning to w o r k with a p r o g r a m . A
utilization-focused evaluator can help in-
13. Key indicators of intended results and cli- tended users review plans and stated goals
ent outcomes for crisis services: to see if they include an outcomes focus.
• Number of patients served There's nothing w r o n g with p r o g r a m
• Number of patient days and the average level (e.g., improve access or quality) or
length of stay system level (e.g., reduce costs) goals, but
• Source of referrals to the crisis unit and re- such goals ought to connect to o u t c o m e s
ferrals provided to patients at discharge for clients. An evaluator can facilitate dis-
cussion of why, in the current political
Comment: Participation numbers, not environment, one hears increased demand
client outcomes. for "outcomes-based" management and pro-
gram funding ( M D H S 1 9 9 6 ; Behn 1 9 9 5 ;
14. Minimize hospitalizations of people with I C M A 1 9 9 5 ; Knapp 1 9 9 5 ; Schalock
severe and persistent mental illness. 1 9 9 5 ; Schorr 1 9 9 3 ; Brizius and Campbell
Comment: This is a system level outcome 1 9 9 1 ; Williams, W e b b , and Phillips 1 9 9 1 ;
that is potentially dangerous. One of the Carver 1990). Evaluators need t o provide
premises of results-oriented management technical assistance in helping p r o g r a m
reviewed in Chapter 1 is that "what gets planners, managers, and other potential
measured gets done." An easy way to attain evaluation users understand the differ-
this desired outcome is simply not to ence between a participant outcomes ap-
refer or admit needy clients to the hospital. proach and traditional p r o g r a m or system
That will minimize hospitalizations (a sys- goals approaches. In particular, they need
tem-level outcome) but may not help assistance understanding the difference
clients in need. A more appropriate out- between service-focused goals versus client-
come focus would be that these clients focused outcome goals. Exhibit 7.1 com-
Beyond the Goals Clarification Game • 157

Service-Focused Versus Client-Focused Outcome Evaluation:
Examples From Parenting Programs

Service-Focused Client-Focused Outcome

Provide coordinated case management Pregnant adolescents will give birth to healthy
services with public health to pregnant babies and care for the infants and themselves
adolescents appropriately

Improve the quality of child protection Children will be safe; they will not be abused
intervention services or neglected

Develop a supportive, family-centered, Parents will adequately care and provide for
capacity-building intervention system for their children
families and children

Provide assistance to parents to make Parents who wish to work will have adequate
employment-related child care decisions child care

pares these two kinds of goals. Both can water. Longer-term outcomes are that the
be useful, but they place emphasis in dif- horse stays healthy and works effectively.
ferent places. But because program staff know they can't
make a horse drink water, they focus on the
things they can control: leading the horse
Leading a Horse to Water to water, making sure the tank is full, moni-
Versus Getting It to Drink toring the quality of the water, and keeping
the horse within drinking distance of the
The shift from service goals to outcomes water. In short, they focus on the processes
often proves difficult in programs and of water delivery rather than the outcome
agencies that have a long history of focus- of water drunk. Because staff can control
ing on services and activities. But even processes but cannot guarantee attaining
where the difference is understood and outcomes, government rules and regula-
appreciated, some fear or resistance may tions get written specifying exactly how to
emerge. One reason is that service provid- lead a horse to water. Funding is based on
ers are well schooled in the proverbial wis- the number of horses led to water. Licenses
dom that "you can lead a horse to water, are issued to individuals and programs that
but you can't make it drink." meet the qualifications for leading horses
This familiar adage illuminates the chal- to water. Quality awards are made for im-
lenge of committing to outcomes. The de- proving the path to the water—and keep-
sired outcome is that the horse drink the ing the horse happy along the way.

Whether the horse drinks the water gets • details of data collection
lost in all this flurry of lead-to-water-ship. • how results will be used
Most reporting systems focus on how many • performance targets
horses get led to the water, and how diffi-
cult it was to get them there, but never quite I'll discuss each of these elements and offer
get around to finding out whether the illustrations from actual programs to show
horses drank the water and stayed healthy. how they fit together. Evaluators can use
One point of resistance to outcomes this framework to work with primary in-
accountability, then, is the fear among tended users.
providers and practitioners that they're be-
ing asked to take responsibility for, and will
be judged on, something over which they Identifying Specific Participant
have little control. The antidote to this fear or Client Target Groups
is building into programming incentives
for attaining outcomes and establishing a I'll use the generic term client to include
results-oriented culture in an organization program participants, consumers of ser-
or agency. Evaluators have a role to play in vices, beneficiaries, students, and custom-
such efforts by facilitating a process that ers, as well as traditional client groups. The
helps staff, administrators, and other stake- appropriate language varies, but for every
holders think about, discuss the implica- program, there is some group that is ex-
tions of, and come to understand both the pected to benefit from and attain outcomes
advantages and limitations of an outcomes as a result of program participation. How-
approach. There's a lot of managerial ever, the target groups identified in enab-
and political rhetoric about being results ling legislation or existing reporting sys-
oriented, but not much expertise in how tems typically are defined too broadly for
to set up a results-oriented system. The meaningful outcomes measurement. In-
next section presents a framework for tended outcomes can vary substantially for
conceptualizing outcomes that are mean- subgroups within general eligible popu-
ingful and measurable for use in facilitat- lations. The trick is to be as specific as
ing an outcomes-oriented management and necessary to conceptualize meaningful out-
evaluation system. comes. Some illustrations may help clarify
why this is so.
Consider a program aimed at support-
Utilization-Focused ing the elderly to continue living in their
Outcomes Framework homes, with services ranging from "meals
on wheels" to home nursing. Not all elderly
This framework distinguishes six sepa- people can or want to stay in their homes.
rate elements that need to be specified for Therefore, if the desired outcome is "con-
focusing an evaluation on participant or tinuing to live in their own home," it would
client outcomes: be inappropriate to specify that outcome
for all elderly people. A more appropriate
• a specific participant or client target group target population, then, would be people
• the desired outcome(s) for that target group over the age of 55 who want to and can
• one or more indicators for each desired remain safely in their homes. For this
outcome group, it is appropriate to aim to keep them
Beyond the Goals Clarification Game • 159

in their homes. It is also clear that some els refer to expected outcomes or intended
kind of screening process will be neces- outcomes. Others prefer the language of
sary to identify this subpopulation of the client goals or client objectives. What is
elderly. important is not the phrase used but that
A different example comes from pro- there be a clear statement of the targeted
grams serving people with developmental change in circumstances, status, level of
disabilities (DD). Many programs exist to functioning, behavior, attitude, knowl-
prepare DD clients for work and then sup- edge, or skills. Other outcome types in-
port them in maintaining employment. clude maintenance and prevention. Exhibit
However, not all people with developmen- 7.2 provides examples of outcomes.
tal disabilities can or want to work. In cases
where funding supports the right of DD
clients to choose whether to work, the Outcome Indicators
appropriate subpopulation become people
with developmental disabilities who can An indicator is just that, an indicator. It's
and want to work. For that specific sub- not the same as the phenomenon of inter-
population, then, the intended outcome est, but only an indicator of that phenome-
could be that they obtain and maintain non. A score on a reading test is an indica-
satisfying employment. tor of reading ability but should not be
There are many ways of specifying sub- confused with a particular person's true
population targets. Outcomes are often dif- ability. All kinds of things affect a test score
ferent for young, middle-aged, and elderly on a given day. Thus, indicators are inevi-
clients in the same general group (e.g., tably approximations. They are imperfect
persons with serious and persistent mental and vary in validity and reliability.
illness). Outcomes for pregnant teens or The resources available for evaluation
teenage mothers may be different from out- will greatly affect the kinds of data that can
comes for mothers receiving welfare who be collected for indicators. For example, if
have completed high school. Outcomes for the desired outcome for abused children is
first-time offenders may be different from that there be no subsequent abuse or ne-
those for repeat offenders. The point is that glect, a periodic in-home visitation and
categories of funding eligibility often in- observation, including interviews with the
clude subgroups for whom different out- child, parent(s), and knowledgeable others,
comes are appropriate. Similarly, when iden- would be desirable, but such data collection
tifying groups by services received, for ex- is expensive. With constrained resources,
ample, counseling services or jobs training, one may have to rely on routinely collected
the outcomes expected for generic services data, that is, official substantiated reports
may vary by subgroups. It is important, of abuse and neglect over time. Moreover,
then, to make sure an intended outcome is when using such routine data, privacy and
meaningful and appropriate for everyone confidentiality restrictions may limit the
in the identified target population. indicator to aggregate results quarter by
quarter rather than one that tracks specific
Specifying Desired Outcomes families over time.
As resources change, the indicator may
The choice of language varies under dif- change. Routine statistics may be used by
ferent evaluation approaches. Some mod- an agency until a philanthropic foundation

E X H I B I T 7.2
Outcome Examples

Type of Change Illustration

Change in circumstances Children safely reunited with their families of origin from foster care
Change in status Unemployed to employed
Change in behavior Truants will regularly attend school
Change in functioning Increased self-care; getting to work on time
Change in attitude Greater self-respect
Change in knowledge Understand the needs and capabilities of children at different ages
Change in skills Increased reading level; able to parent appropriately
Maintenance Continue to live safely at home (e.g., the elderly)
Prevention Teenagers will not use drugs

funds a focused evaluation to get better are still alive a year after the trees are
data for a specific period of time. In such a planted.
case, the indicator would change, but the Another factor affecting indicator selec-
desired outcome would not. This is the tion is the demands data collection will put
advantage of clearly distinguishing the de- on program staff and participants. Short-
sired outcome from its indicator. As the term interventions such as food shelves,
state of the art of measurement develops or recreational activities for people with de-
resources change, indicators may improve velopmental disabilities, drop-in centers,
without changing the desired outcome. and one-time community events' do not
Time frames also affect indicators. The typically engage participants intensely
ultimate goal of a program for abused enough to justify collection of much, if any,
children would be to have them become data. Many programs can barely collect
healthy, well-functioning, and happy adults, data on end-of-program status, much less
but policymakers cannot wait 10 to 15 follow-up data.
years to assess the outcomes of a program In short, a variety of factors influence
for abused children. Short-term indicators the selection of indicators, including the
must be relied on, things like school at- importance of the outcome claims being
tendance, school performance, physical made, resources available for data collec-
health, and the psychological functioning tion, the state of the art of measurement of
of a child. These short-term indicators pro- human functioning, the nature of decisions
vide sufficient information to make judg- to be made with the results, and the will-
ments about the likely long-term results. It ingness of staff and participants to engage
takes 30 years for a forest to grow, but you in assessment. Some kind of indicator is
can assess the likelihood of ending up with necessary, however, to measure degree of
a forest by evaluating how many saplings outcome attainment. The key is to make
Beyond the Goals Clarification Game • 161

sure that the indicator is a reasonable, use- to establish a purpose and direction for a
ful, and meaningful measure of the in- program. It is quite another thing to say
tended client outcome. how that purpose and direction are to be
The framework offered here will gener- measured. By confusing these two steps and
ate outcome statements that are clear, spe- making them one, program goals can be-
cific, and measurable, but getting clarity come detached from what program staff
and specificity is separated from selecting and funders are actually working to accom-
measures. The reason for separating the plish. Under such a constraint, staff begin
identification of a desired outcome from by figuring out what can be measured.
its measurement is to ensure the utility of Given that they seldom have much exper-
both. This point is worth elaborating. The tise in measurement, they end up counting
following is a classic goal statement: fairly insignificant behaviors and attitudes
that they can somehow quantify.
Student achievement test scores in reading When I work with groups on goals clari-
will increase one grade level from the begin- fication, I have them state intended out-
ning of first grade to the beginning of second comes without regard to measurement.
grade. Once they have stated as carefully and ex-
plicitly as they can what they want to ac-
Such a statement mixes together and poten- complish, then it is time to figure out what
tially confuses the (1) specification of a indicators and data can be collected to
desired outcome with (2) its measurement monitor outcome attainment. They can
and (3) the desired performance target. then move back and forth between con-
The desired outcome is increased student ceptual level statements and operational
achievement. The indicator is a norm- (measurement) specifications, attempting
referenced standardized achievement test. to get as much precision as possible in both.
The performance target is one year's gain
To emphasize this point, let me overstate
on the test. These are three separate deci-
the trade-off. I prefer to have soft or rough
sions that primary intended evaluation us-
measures of important goals rather than
ers need to discuss. For example, there are
highly precise, quantitative measures of
ways other than standardized tests for
goals that no one much cares about. In too
measuring achievement, for example, stu-
dent portfolios or competency-based tests. many evaluations, program staff are forced
The desired outcome should not be con- to focus on the latter (meaningless but
fused with its indicator. In the framework measurable goals) instead of on the former
offered here, outcome statements are (meaningful goals with soft measures).
clearly separated from operational criteria Of course, this trade-off, stated in stark
for measuring them. terms, is only relative. It is desirable to have
Another advantage of separating out- as much precision as possible. By separating
comes identification from indicator selec- the process of goals clarification from the
tion is to encourage program staff to be process of selecting goal indicators, it is
serious about the process. A premature fo- possible for program staff to focus first on
cus on indicators may be heard as limiting what they are really trying to accomplish
a program to attempt only those things that and to state their goals and objectives as
staff already know how to measure. Such a explicitly as possible without regard to
limitation is too constraining. It is one thing measurement, and then to worry about

how one would measure actual attainment In a political environment of outcomes

of those goals and objectives. mania, meaningfulness and utility are not
necessarily priorities. Consider this exam-
ple and judge for yourself. The 1995 An-
nual Management Report from the Office
Performance Targets of the New York City Mayor included this
performance target: The average daytime
A performance target specifies the speed of cars crossing from one side of
amount or level of outcome attainment that midtown Manhattan to the other will in-
is expected, hoped for, or, in some kinds of crease from 5.3 to 5.9 miles per hour.
performance contracting, required. What Impressed by this vision of moving from
percentage of participants in employment a "brisk 5.3" to a "sizzling 5.9," The
training will have full-time jobs six months New Yorker magazine interviewed Ruben
after graduation: 40%? 65%? 80%? What Ramirez, Manhattan's Department of
percentage of fathers failing to make child Transportation Traffic Coordinator, to ask
support payments will be meeting their full how such a feat could be accomplished in
child support obligations within six months the face of downsizing and budget cuts.
of intervention? 15%? 35%? 60%? Ramirez cited better use of resources.
The best basis for establishing future Asked what could he accomplish with ade-
performance targets is past performance. quate resources, he replied: "I think we
"Last year we had 65% success. Next year could do six or seven, and I'm not being
we aim for 70%." Lacking data on past outrageous." The New Yorker found such a
performance, it may be advisable to wait performance target a "dreamy future," one
until baseline data have been gathered be- in which it might actually be possible to
fore specifying a performance target. Arbi- drive across midtown Manhattan faster
trarily setting performance targets without than you can walk ("Speed" 1995:40).
some empirical baseline may create artifi- Is such a vision visionary? Is a perfor-
cial expectations that turn out unrealisti- mance increase from 5.3 to 5.9 miles per
cally high or embarrassingly low. One way hour meaningful? Is 6 or 7 worth aiming
to avoid arbitrariness is to seek norms for for? For a noncommuting Minnesotan,
reasonable levels of attainment from other, such numbers fail to impress. But, con-
comparable programs, or review the evalu- verted into annual hours and dollars saved
ation literature for parallels. for commercial vehicles in Manhattan, the
As indicators are collected and exam- increase may be valued in hundreds of
ined over time, from quarter to quarter, thousands of dollars, perhaps even mil-
and year to year, it becomes more meaning- lions. It's for primary stakeholders in Man-
ful and useful to set performance targets. hattan, not Minnesota, to determine the
The relationship between resources and meaningfulness of such a performance
outcomes can also be more precisely corre- target.
lated longitudinally, with trend data, all of
which increases the incremental and long-
term value of an outcomes management Details of Data Collection
The challenge is to make performance The details of data collection are a dis-
targets meaningful. tinct part of the framework; they must be
Beyond the Goals Clarification Game • 163

attended to, but they shouldn't clutter the what would you do? If the findings came
focused outcome statement. Unfortunately, out this other way, what would that tell
I've found that people can get caught up in you, and what actions would you take?
the details of refining methods and lose Given what you want the evaluation to
sight of the outcome. The details typically accomplish, have we focused on the right
get worked out after the other parts of the outcomes and useful indicators? At every
framework have been conceptualized. De- stage of a utilization-focused evaluation,
tails include answering the following kinds the evaluator facilitator pushes intended
of questions: users to think seriously about the implica-
tions of design and measurement decisions
• What existing data will be used and how will for use.
they be accessed? Who will collect new indi-
cators data?
• Who will have oversight and management Interconnections Among the
responsibility for data collection? Distinct Parts of the Framework
• How often will indicators data be collected?
How often reported? The utilization-focused outcomes frame-
• Will data be gathered on all program partici- work, as just reviewed, consists of six parts:
pants or only a sample? If a sample, how a specific participant target group; a de-
selected? sired outcome for that group; one or more
• How will findings be reported? To whom? outcome indicators; a performance target
In what format? When? How often? (if appropriate and desired); details of data
collection; and specification of how find-
These pragmatic questions put flesh on ings will be used. While these are listed in
the bones of the outcomes framework. the order in which intended users and staff
They are not simply technical issues, how- typically conceptualize them, the concep-
ever. How these questions get answered tualization process is not linear. Groups
will ultimately determine the credibility often go back and forth in iterative fashion.
and utility of the entire approach. Primary The target group may not become really
intended users need to be involved in mak- clear until the desired outcome is specified
ing decisions about these issues to ensure or an indicator designated. Sometimes for-
that they feel ownership of and responsi- mulating the details of data collection will
bility for all aspects of the evaluation. give rise to new indicators, and those indi-
cators force a rethinking of how the desired
outcome is stated. The point is to end up
How Results Will Be Used with all elements specified, consistent with
each other, and mutually reinforcing. That
The final element in the framework is to doesn't necessarily mean marching through
make sure that the data collected on the the framework lockstep.
outcomes identified will be useful. This Exhibit 7.3 provides an example of all
means engaging intended users in a simula- the elements specified for a parenting pro-
tion exercise in which they pretend that gram aimed at high school-age mothers.
they have results and are interpreting and Completing the framework often takes
using those results. The evaluation facilita- several tries. Exhibit 7.4 shows three ver-
tor asks: If the results came out this way, sions of the utilization-focused outcomes

Example of a Fully Specified
Utilization-Focused Outcome Framework

Target subgroup: Teenage mothers at Central High School

Desired outcome: Appropriate parenting knowledge and practices

Outcome indicator: Score on Parent Practice Inventory (knowledge and behavior


Data collection: Pre- and post-test, beginning and end of program; six-month follow-
district evaluation office will administer and analyze results

Performance target: 75% of entering participants will complete the program and attain
a passing score on both the knowledge and behavior scales

Use: The evaluation advisory task force will review the results (principal,
two teachers, two participating students, one agency representative
one community representative, an associate superintendent, one
school board member, and the district evaluator). The task force will
decide if the program should be continued at Central High School
and expanded to other district high schools. A recommendation will
be forwarded to the superintendent and school board.

framework as it emerged from the work of political judgment. Those involved will feel
a developmental disabilities staff group. the most ownership of the resulting system.
Their first effort yielded a service-oriented Some processes involve only managers
goal. They revised that with a focus on skill and directors. Other processes include ad-
enhancement. Finally, they agreed on a visory groups from the community. Col-
meaningful client outcome: functioning laboration between funders and service
independently. providers in determining outcomes is criti-
cal where contracts for services are in-
volved. Advice from some savvy foundation
A Utilization-Focused Process
funders is to match outcomes evaluation to
for Developing Outcomes
the stage of a program's development
A central issue in implementing an out- (Daniels 1996), keep the context long-term
comes evaluation approach is who will be (Mcintosh 1996) and "turn outcome
involved in the process of developing the 'sticks' into carrots" (Leonard 1996:46).
outcomes. When the purpose is ongoing Exhibit 7.5 shows the stages of a utiliza-
management by outcomes, the program's tion-focused approach to developing an
executives and staff must buy into the pro- outcomes-based management system for a
cess. Who else is involved is a matter of program (MDHS 1996). Critical issues and
=P •4=
o CD
CO ^
CO "cO
s "D "O


cr CO
O •o


< o
0> o
CO o 1
O 05 CD Q.

ist rat ion

^ E O

^ o
O "CO •§

E ~o cr ZJ Ol


CO 03 en o
ex o

tr CD o —
E o
o j5 O .= Q

o o
CO 1 co

.y c
£ .32

S if |

I a> o
:l cu

o "g


- i 3
o — £*
ffl •? - t
m - O

1 o
X «
w o S E
c S -S
"35 CO
> £ £ £ ^^-, CD 5 •*-

3 8AI


CU Q. T3
O =
o C
O "55 -e CD
S o CO
> .!= >
as CD CD


ID •o "a

CD en
Q 5 .2
> _
^ ^ co

o s a o s O 5 o


£ co

£< o

cc ii fc
-J* : « co

sun lllrt
ttt CD CD

VZ *

2 E -
El Wit
^ • S - 2 ">

i S>
II co


O <D
o w om


5§ •l3 §5 - I0
= X5.S£ 4 i <D1 t. o CD

> 1
„ „ <o
lliilllll III
"Mil! iilliuil
o o,

1 I II
cu •es -SK
1*1 1* *= - 5 ^



liiMHfi lill
fiimi iljiiiii in
S ^ jf, -5 -S
LU-K J 3 S mil
Jlllll Hiiifiii lillilfi

CD £=
CD - 0
>- -•=:
.c Hi
< .E ^
? £ » • .1,1 — to
c o
CO >

Sijlllltlll Illllili
<D ^ CD

Hllffli.1 saBeis
limit! I
3 .S-

Beyond the Goals Clarification Game • 167

parallel activities are shown for each stage. guidelines for working with intended users
Those to be involved will need training and to identify meaningful and useful goals.
support. I've found it helpful to begin with
an overview of the purpose of an outcomes- 1. Distinguish between outcome goals
focused programming approach: history, and activities. Outcomes describe desired
trends, the political climate, and potential impacts of the program on participants:
benefits. Then I have participants work in Students will read with understanding. Par-
small groups working on the elements of ticipants will stop smoking. Activities goals
the utilization-focused outcomes frame- describe how outcome goals will be
work (see Exhibit 7.3) for an actual pro- achieved: Students will read two hours a
gram with which they're familiar. Facili- day. Participants will openly discuss their
tation, encouragement, and technical as- dependence on cigarettes. People in the
sistance are needed to help such groups program will be treated with respect.
successfully complete the task. Where mul-
tiple groups are involved, I like to have 2. Outcome goals should be clearly out-
them share their work and the issues that come oriented. Program staff often write
emerged in using the outcomes framework. activity goals thinking that they have stated
It's important that those involved get a desired outcomes. An agricultural exten-
chance to raise their concerns openly. sion agent told me his goal was "to get 50
There's often suspicion about political mo- farmers to participate in a farm tour." But
tives. Providers worry about funding cuts what, I asked, did he want to result from
and being held accountable for things they the farm tour? After some dialogue, it be-
can't control. Administrators and directors came clear that the outcome goal was this:
of programs worry about how results will "Farmers will adopt improved milking
be used, what comparisons will be made, practices in their own farm operations."
and who will control the process. Line staff A corporation stated one of its goals for
worry about the amount of time involved, the year as "establishing a comprehensive
paperwork burdens, and the irrelevancy of energy conservation program." After we
it all. State civil servants responsible for discussed that it was perfectly possible to
reporting to the Legislature worry about establish such a program without ever sav-
how data can be aggregated at the state ing any energy, they rewrote the goal: "The
level. These and other concerns need to be corporation will significantly reduce en-
aired and addressed. Having influential ergy consumption."
leaders visibly involved in the process en-
hances their own understanding and com- 3. It should be possible to conceptual-
mitment while also sending signals to oth- ize either the absence of the desired out-
ers about the importance being placed on come or an alternative to it. Some goal
outcomes. statements are amazingly adept at saying
nothing. I worked with a school board
whose overall goal was "Students will
learn." There is no way not to attain this
Meaningful and Useful Goals goal. It is the nature of the species that
young people learn. Fortunately, they can
With the utilization-focused outcomes learn in spite of the schools. The issues are
framework as background, here are 10 what and how much they will learn.

Another favorite is "increasing aware- ent audiences for a variety of purposes."

ness." It's fairly difficult to put people The New York Times (1996) found this
through two weeks of training on some goal less than inspiring or user-friendly,
topic (e.g., chemical dependency) and not and editorialized: "a fog of euphemism
increase awareness. Under these condi- and evasion" (p. A24). Bumper sticker:
tions, the goal of "increasing awareness of Honk if you use writing process elements
chemical dependency issues" is hardly appropriately.
worth aiming at. Further dialogue revealed
that the program staff wanted to change 6. Formal goals statements should fo-
knowledge, attitudes, and behavior. cus on the most important program out-
comes. Writing goals should not be a mara-
4. Each goal and objective should con- thon exercise in seeing how long a
tain only one idea. There is a tendency in document one can produce. As human be-
writing goal statements to overload the ings, our attention span is too short to focus
content. on long lists of goals and objectives. Limit
them to outcomes that matter and for
5. The statement of goals and objec- which the program intends to be held
tives should be understandable. Goals accountable.
should communicate a clear sense of direc-
tion. Avoid difficult grammatical construc- 7. Keep goal statements separate from
tions and complex interdependent clauses. statements of how goals are to be attained.
Goal statements should also avoid internal An agricultural extension program had this
program or professional jargon. The gen- goal: "Farmers will increase yields through
eral public should be able to make sense of the educational efforts of extension includ-
goals. Consider these two versions of goal ing farm tours, bulletins, and related activi-
statements for what amount to the same ties." Everything after the word yields de-
outcome: scribes how the goal is to be attained. Keep
the goal focused, clear, and crisp.
(a) To maximize the capabilities of professional
staff and use taxpayer resources wisely 8. Separate goals from indicators. Ad-
while engaging in therapeutic interventions vocates of management by objectives and
and case management processes so that behavioral objectives often place more em-
children's developmental capacities are un- phasis on measurement than on establish-
encumbered by adverse environmental cir- ing a clear sense of direction (Combs 1972).
cumstances or experiences. The two are related, but not equivalent.
(b) Children will be safe from abuse and neglect.
9. Make the writing of goals a positive
Now, see if you can make sense of this experience. Goals clarification exercises
beauty from the National Council of are so often approached as pure drudgery
Teachers of English and the International that staff hate not only the exercise itself
Reading Association: "Students employ a but also the resulting goals. Goals clarifica-
wide range of strategies as they write and tion should be an invigorating process of
use different writing process elements ap- prioritizing what those involved care about
propriately to communicate with differ- and hope to accomplish. Goals should not
Beyond the Goals Clarification Game • 169

become a club for assaulting staff but a tool clarify purposes at three levels: the overall
for helping staff focus and realize their mission of the program or organization, the
ideals. goals of specific programmatic units (or
subsystems), and the specific objectives that
10. Thou shalt not covet thy neighbor's specify desired outcomes. The mission
goals and objectives. Goals and objectives statement describes the general direction of
don't travel very well. They often involve the overall program or organization in
matters of nuance. It is worth taking the long-range terms. The peacetime mission
time for primary stakeholders to construct of the U.S. Army is simply "readiness." A
their own goals so that they reflect their mission statement may specify a target
own values, expectations, and intentions in population and a basic problem to be at-
their own language. tacked. For example, the mission of the
There are exceptions to all of these Minnesota Comprehensive Epilepsy Pro-
guidelines, particularly the last one. One gram was to "improve the lives of people
option in working with groups is to have with epilepsy."
them review the goals of other programs, The terms goals and objectives have
both as a way of helping stakeholders clar- been used interchangeably up to this point,
ify their own goals and to get ideas about but it is useful to distinguish between them
format and content. Evaluators who work as representing different levels of general-
with behavioral objectives often develop a ity. Goals are more general than objectives
repertoire of potential objectives that can and encompass the purposes and aims of
be adopted by a variety of programs. The program subsystems (i.e., research, educa-
evaluator has already worked on the tech- tion, and treatment in the epilepsy exam-
nical quality of the goals so program staff ple). Objectives are narrow and specific,
can focus on selecting the content they stating what will be different as a result of
want. Where there is the time and inclina- program activities, that is, the concrete out-
tion, however, I prefer to have a group comes of a program. To illustrate these
work on its own goals statement so that differences, a simplified version of the mis-
participants feel ownership and understand sion statement, goals, and objectives for the
what commitments have been made. This Minnesota Comprehensive Epilepsy Pro-
can be part of the training function served gram is presented in Exhibit 7.6. This out-
by evaluators, increasing the likelihood line was developed after an initial discus-
that staff will have success in future goals sion with the program director. The
clarification exercises. purpose of the outline was to establish a
context for later discussions aimed at more
clearly framing specific evaluation ques-
Levels of Goal Specification tions. In other words, we used this goals
clarification and objectives mapping exer-
From Overall Mission cise as a means of focusing the evaluation
to Specific Objectives question rather than as an end in itself.
The outline of goals and objectives for
To facilitate framing evaluation ques- the Epilepsy Project (Exhibit 7.6) illustrates
tions in complex programs, evaluators may several points. First, the only dimension
have to work with primary stakeholders to that consistently differentiates goals and

objectives is the relative degree of specific- an exercise. In complex programs, evalua-

ity of each: objectives narrow the focus of tors can spend so much time working on
goals. There is no absolute criterion for goals statements that considerable momen-
distinguishing goals from objectives; the tum is lost.
distinction is always a relative one.
Second, this outline had a specific evalu-
ation purpose: to facilitate priority setting Establishing Priorities:
as I worked with primary intended users to Importance Versus Utility
focus the evaluation. Resources were insuf-
ficient to fully evaluate all three component Let me elaborate the distinction between
parts of the program. Moreover, different writing goals for the sake of writing goals
program components faced different con- and writing them to use as tools in nar-
tingencies. Treatment and research had rowing the focus of an evaluation. In
more concrete outcomes than education. utilization-focused evaluation, goals are
The differences in the specificity of the prioritized in a manner quite different from
objectives for the three components reflect that usually prescribed. The usual criterion
real differences in the degree to which the for prioritizing goals is ranking or rating in
content and functions of those program terms of importance (Edwards, Guttentag,
subsystems were known at the beginning of and Snapper 1975; Gardiner and Edwards
the evaluation. Thus, with limited re- 1975). The reason seems commonsensical:
sources and variations in goal specific- Evaluations ought to focus on important
ity, it was necessary to decide which goals. But, from a utilization-focused per-
aspects of the program could best be served spective, what appears to be most sensible
by evaluation. may not be most useful.
Third, the outline of goals and objectives The most important goal may not be the
for the Comprehensive Epilepsy Program one that decision makers and intended
is not particularly well written. I con- users most need information about. In
structed the outline from notes taken dur- utilization-focused evaluation, goals are
ing my first meeting with the director. At also prioritized on the basis of what infor-
this early point in the process, the outline mation is most needed and likely to be most
was a tool for posing this question to evalu- useful, given the evaluation's purpose. For
ation decision makers: Which program example, a summative evaluation would
components, goals, and objectives should likely evaluate goals in terms of overall
be evaluated to produce the most useful importance, but a formative evaluation
information for program improvement and might focus on a goal of secondary impor-
decision making? That is the question. To tance because it is an area being neglected
answer it, one does not need technically or proving particularly troublesome.
perfect goal statements. Once the evalu- Ranking goals by importance is often
ation is focused, relevant goals and objec- quite different from ranking them by the
tives can be reworked as necessary. The utility of evaluative information needed.
point is to avoid wasting time in the con- Exhibit 7.7 provides an example from the
struction of grandiose, complex models of Minnesota Comprehensive Epilepsy Pro-
program goals and objectives just because gram, contrasting goals ranked by impor-
the folklore of evaluation prescribes such tance and utility. Why the discrepancy? The
Beyond the Goals Clarification Game • 171

Minnesota Comprehensive Epilepsy Program:
Mission Statement, Goals, and Objectives

Program Mission: Improve the lives of people with epilepsy

Research Component

Goal 1: Produce high quality, scholarly research on epilepsy

Objective 1: Publish research findings in high-quality, refereed journals

Objective 2: Contribute to knowledge about:

a. neurological aspects of epilepsy
b. pharmacological aspects of epilepsy
c. epidemiology of epilepsy
d. social and psychological aspects of epilepsy
Goal 2: Produce interdisciplinary research

Objective 1: Conduct research projects that integrate principal investigators from

different disciplines

Objective 2: Increase meaningful exchanges among researchers from different


Education Component

Goal 3: Health professionals will know the nature and effects of epilepsy behaviors

Objective 1: Increase the knowledge of health professionals who serve people with
epilepsy so that they know:
a. what to do if a person has a seizure
b. the incidence and prevalence of epilepsy

Objective 2: Change the attitudes of health professionals so that they:

a. are sympathetic to the needs of people with epilepsy
b. believe in the importance of identifying the special needs of people with epilepsy

Goal 4: Educate persons with epilepsy about their disorder

Goal 5: Inform the general public about the nature and incidence of epilepsy.

Treatment Component

Goal 6: Diagnose, treat, and rehabilitate persons with severe, chronic, and disabling seizures

Objective 1: Increase seizure control in treated patients

Objective 2: Increase the functioning of patients


Minnesota Comprehensive Epilepsy Program:
Goals Ranked by Importance to Program Versus Goals Ranked
by Utility of Evaluative Information Needed by Primary Users

Ranking Goals by Usefulness of

Ranking of Goals by Importance Evaluative Information to Intended Users

1. Produce high-quality scholarly research 1.

1. Integrate the separate program
on epilepsy components into a comprehensive whole
that is greater than the sum of its parts

2. Produce interdisciplinary research 2. Educate health professionals about epilepsy

3. Integrate the separate components into 3. Diagnose, treat, and rehabilitate people
a whole with chronic and disabling seizures

4. Diagnose, treat, and rehabilitate people 4. Produce interdisciplinary research

with chronic and disabling seizures

staff did not feel they needed a formal al cause it was a goal area about which the
evaluation to monitor attainment of the le program staff had many questions. The
most important program goal. The publish- h- education component was expected to be a
ing of scholarly research in refereed jour-r- difficult, long-term effort. Information
nals was so important that the director wasas about how to increase the educational
committed to personally monitor perfor- r- impact of the Comprehensive Epilepsy
mance in that area. Moreover, he was rela- a- Program had high use potential. In a
tively certain about how to achieve that at utilization-focused approach, the primary
outcome, and he had no specific evaluation >n intended users make the final decision
question related to that goal that he needed about evaluation priorities,
answered. By contrast, the issue of compre-e- In my experience, the most frequent
hensiveness was quite difficult to assess. It reason for differences in importance and
was not at all clear how comprehensiveness ss usefulness rankings is variation in the de-
could be facilitated, although it was thirdrd gree to which decision makers already have
on the importance list. Data on compre- e- what they consider good information about
hensiveness had high formative utility. performance on the most important goal,
The education goal, second on the use- e- At the program level, staff members may be
fulness list, does not even appear among lg so involved in trying to achieve their most
the top four goals on the importance list. »t. important goal that they are relatively well
Yet, information about educational impact ct informed about performance on that goal,
was ranked high on the usefulness list be- e- Performance on less important goals may
Beyond the Goals Clarification Game • 173

involve less certainty for staff; information Contrast that advice with the perspective
about performance in that goal area is of an evaluator from our study of use of
therefore more useful because it tells staff federal health evaluations:
members something they d o not already
know. I'd make this point about minor evaluation
W h a t I hope is emerging through these studies. If you have an energetic, conscien-
examples is an image of the evaluator as an tious program manager, he's always inter-
active-reactive adaptive problem solver. ested in improving his program around the
periphery, because that's where he usually
The evaluator actively solicits information
can. And an evaluation study of some minor
about program contingencies, organiza-
aspect of his program may enable him to
tional dynamics, environmental uncertain-
significantly improve. [EV52:171]
ties, and decision makers' goals in order to
focus the evaluation on questions of real
In our study, we put the issue to deci-
interest and utility to primary intended
sion makers and evaluators as follows:

Another factor sometimes believed to affect

use has to do with whether the central objec-
tives of a program are evaluated. Some writ-
Evaluation of Central
ers argue that evaluations can have the great-
Versus Peripheral Goals
est impact if they focus on major program
objectives. What happened in your case?
Prioritizing goals on the basis of per-
ceived evaluative utility means that an
The overwhelming consensus was that, at
evaluation might focus on goals of appar-
the very least, central goals ought to be
ent peripheral importance rather than
evaluated and, where possible, both central
more central program goals. This is a mat- and peripheral goals should be studied. As
ter of some controversy. In her early work, they elaborated, nine decision makers and
Weiss (1972b) offered the following advice eight evaluators said that utilization had
to evaluators: probably been increased by concentrating
on central issues. This phrase reflects an
important shift in emphasis. As they elabo-
The evaluator will have to press to find out rated their answers about evaluating cen-
priorities—which goals the staff sees as criti- tral versus peripheral goals, they switched
cal to its mission and which are subsidiary. from talking about goals to talking about
But since the evaluator is not a mere techni- "issues." Utilization is increased by focus-
cian for the translation of a program's stated ing on central issues. And what is a central
aims into measurement instruments, he has issue? It is an evaluation question that
a responsibility to express his own interpre- someone really cares about. The subtle dis-
tation of the relative importance of goals. He tinction here is critical. Evaluations are use-
doesn't want to do an elaborate study on the ful to decision makers if they focus on
attainment of minor and innocuous goals, central issues—which may or may not in-
while some vital goals go unexplored, clude evaluating attainment of central
(pp. 30-31; emphasis added) goals.

The Personal Factor Revisited

Different people will have different per- the evaluator's job any easier. It does mean
ceptions of what constitutes central pro- that the personal factor remains the key to
gram goals or issues. Whether it is the evaluation use. The careful selection of
evaluator's opinion about centrality, the knowledgeable, committed, and informa-
funder's, some special interest group's per- tion-valuing people makes the difference.
spective, or the viewpoints of program staff The goals clarification game is most mean-
and participants, the question of what con- ingful when played by people who are
stitutes central program goals and objec- searching for information because it helps
tives remains an intrinsically subjective them focus on central issues without letting
one. It cannot be otherwise. The question the game become an end in itself or turning
of central versus peripheral goals cannot it into a contest between staff and evalua-
really be answered in the abstract. The tors.
question thus becomes: central from whose
point of view? The personal factor (Chap-
ter 3) intersects the goals clarification pro-
cess in a utilization-focused evaluation. In- The Goals Paradox
creasing use is largely a matter of matching:
getting information about the right ques- This chapter began with an evaluation
tions, issues, and goals to the right people. of Nasrudin's hunting trip in search of
Earlier in this chapter, I compared the bears. For Nasrudin, that trip ended with
goals clarification process to the party the "marvelous" outcome of seeing no
game of Twenty Questions. Research indi- bears. Our hunting trip in search of the role
cates that different individuals behave quite of goals in evaluation has no conclusive
differently in such a game (and, by exten- ending because the information needs of
sion, in any decision-making process). primary intended users will vary from
Worley (1960), for example, studied sub- evaluation to evaluation and situation to
jects' information-seeking endurance in the situation. Focusing an evaluation on pro-
game under experimental conditions. In- gram goals and objectives is clearly not the
itially, each subject was presented with a straightforward, logical exercise depicted
single clue and given the option of guessing by the classical evaluation literature be-
what object the experimenter had in mind cause decision making in the real world is
or of asking for another clue. This option not purely rational and logical. This is the
was available after each new clue, but a paradox of goals. They are rational abstrac-
wrong guess would end the game. Worley tions in nonrational systems. Statements of
found large and consistent individual dif- goals emerge at the interface between the
ferences in the amount of information play- ideals of human rationality and the reality
ers sought. Donald Taylor (1965) cites the of diverse human values and ways of think-
research of Worley and others as evidence ing. Therein lies their strength and their
that decision-making and problem-solving weakness. Goals provide direction for ac-
behavior is dynamic, highly variable, and tion and evaluation, but only for those who
contingent upon both situational and indi- share in the values expressed by the goals.
vidual characteristics. This does not make Evaluators live inside that paradox.
Beyond the Goals Clarification Game • 175

One way out of the paradox is to focus Note

the evaluation without making goal at-
tainment the central issue. The next chap- 1. This material, and related information in
ter considers alternatives to goals-based the chapter, has been adapted and used with
evaluation. permission from the Minnesota Department of
Human Services.
Focusing an Evaluation
Alternatives to Goals-Based Evaluation

M reative thinking may mean simply the realization that there's no particular vir-
^^m^^^^tue in doing things the way they always have been done.
—Rudolf Flesch

/ f you can see in any given situation only what everybody else can see, you can
^i -* be said to be so much a representative of your culture that you are a victim of it.
—S. I. Hayakawa

More Than One Way to Manage a Horse

Here is a story about the young Alexander from Plutarch (1952):

There came a day when Philoneicus the Thessalian brought King Philip a horse named
Bucephalus, which he offered to sell for 13 talents. The king and his friends went down
tn the plain to watch the horse's trials and came to the conclusion that he was wild and
quite unmanageable, for he would allow no one to mount him, nor would he endure
the shouts of Philip's grooms, but reared up against anyone who approached. The king
became angry at being offered such a vicious unbroken animal and ordered it led away.
But Alexander, who was standing close by, remarked, "What a horse they are losing,
and all because they don't know how to handle him, or dare not try!"

King Philip kept quiet at first, but when he heard Alexander repeat these words and
saw that he was upset, he asked him: "Do you think you know more than your elders
or can manage a horse better?"
"I could manage this one better," retorted Alexander.
"And if you cannot," said his father, "what penalty will you pay for being so
"I will pay the price of the horse," answered the boy. At this, the whole company
burst out laughing. As soon as the father and son had settled the terms of the bet,
Alexander went quickly up to Bucephalus, took off his bridle, and turned him towards
the sun, for he had noticed that the horse was shying at the sight of his own shadow, as
it fell in front of him and constantly moved whenever he did. He ran alongside the
animal for a little way, calming him down by stroking him, and then, when he saw he
was full of spirit and courage, he quietly threw aside his cloak and with a light spring
vaulted safely onto his back. For a little while, he kept feeling the bit with the reins,
without jarring or tearing his mouth, and got him collected. Finally, when he saw that
the horse was free of his fears and impatient to show his speed, he gave him his head
and urged him forward, using a commanding voice and touch of the foot.
King Philip held his breath in an agony of suspense until he saw Alexander reach the
end of his gallop, turn in full control, and ride back triumphant, exulting in his success.
Thereupon the rest of the company broke into loud applause, while his father, we am
told, actually wept for joy. When Alexander had dismounted, he kissed him and said:
"My boy, you must find a kingdom big enough for your ambitions. M.irrJunia is ton
small for you."

More Than One Way to

Focus an Evaluation
The last chapter focused on goals and out-
Young Alexander, later to be Alexander comes as traditional ways to focus an evalu-
the Great, showed that there was more than ation. A program with clear, specific, and
one way to manage a horse. What I like measurable goals is like a horse already
most about this story, as a metaphor for trained for riding. Programs with multiple,
managing an evaluation, is that he based his conflicting, and still developing or ever-
approach to the horse on careful observa- changing goals can feel wild and risky to an
tions of the horse and situation. He noticed evaluator whose only experience is with
that the horse was afraid of its shadow, so seasoned and trained horses. This chapter
he turned him toward the sun. He estab- will examine why goals-based evaluation
lished a relationship with the wild animal often doesn't work and offer alternatives
before mounting it. He was sensitive to for focusing an evaluation. Just as there's
the horse's response to the bit and reins. more than one way to manage a wild horse,
Alexander exemplified being active, reac- there's more than one way to manage
tive, and adaptive. evaluation of a seemingly chaotic program.
Focusing an Evaluation • 179

Problems With Goals-Based Evaluation

o ne can conduct useful evaluations without ever seeing an objective.

—Smith 1980:39

Alternatives to goals-based evaluation Another critique of goals is that they're

have emerged because of the problems often unreal. Since I've argued that evalu-
evaluators routinely experience in attempt- ation is grounded in reality testing, it be-
ing to focus on goals. In addition to fuzzy hooves us to examine the reality of goals.
goals and conflicts over goals—problems To "reify" is to treat an abstraction as if it
addressed in the previous chapter—mea- were real. Goals have been a special target
suring goal attainment can overpoliticize of social scientists concerned with concept
goals. In this regard, Lee J. Cronbach and reification. For example, Cyert and M a r c h
associates (1980) at the Stanford Evalu- (1963:28) have asserted that individual
ation Consortium have warned about the people have goals, collectivities of people
distortions that result when program staff do not. They likewise asserTthat only indi-
pay too much attention to what an evalu- viduals can act; organizations or programs,
ator decides to measure, essentially giving as such, cannot be said to take action. T h e
the evaluator the power to determine what future state desired by an organization (its
activities become primary in a program. goals) is nothing but a function of individ-
ual "aspirations." „ / . . . • ''• -'
It is unwise for evaluation to focus on Azumi and Hage (1972) reviewed the
whether a project has "attained its goals." debate about whether organizations have
Goals are a necessary part of political rheto- goals and concluded, "Organizational soci-
ric, but all social programs, even supposedly ologists have found it useful to assume that
targeted ones, have broad aims. Legislators organizations are p u r p o s i v e . . . . However,
who have sophisticated reasons for keeping it has been much more difficult to actually
goal statements lofty and nebulous unblush- measure the goals of an organization. Re-
ingly ask program administrators to state searchers find the purposive image helpful
explicit goals. Unfortunately, whatever the but somehow elusive" (p. 414).
evaluator decides to measure tends to be- In brief, social scientists who study goals
come a primary goal of program operators, are not quite sure what they are studying.
(p. 5) Goals analysis as a field of study is com-
plex, chaotic, controversial, and confusing.
In other w o r d s , w h a t gets measured In the end, most researchers follow the
gets d o n e . An example is w h e n teachers pragmatic logic of organizational sociolo-
focus on w h e t h e r students can pass a read- gist Charles Perrow (1970):
ing test rather than on whether they learn
to read. T h e result can be students w h o For our purposes we shall use the concept of
pass m a n d a t e d competency tests but are an organizational goal as if there were no
still functionally illiterate. question concerning its legitimacy, even

though we recognize that there are legitimate clarity and stability of goals are contingent
objections to doing so. Our present state of on the organization's environment. Emery
conceptual development, linguistic prac- and Trist (1965) identified four types of
tices, and ontology (knowing whether organizational environments characterized
something exists or not) offers us no alterna- by varying degrees of uncertainty facing the
tive, (p. 134) organization. Uncertainty includes things
like funding stability, changes in rules and
Like Perrow, evaluators are likely to regulations, mobility and transience of cli-
come down on the side of practicality. ents and suppliers, and political, economic,
The language of goals will continue to or social turbulence. What is important
dominate evaluation. By introducing the about their work from an evaluation per-
issue of goals reification, I have hoped spective is the finding that the degree of
merely to induce a modicum of caution uncertainty facing an organization directly
and compassion among evaluators before affects the degree to which goals and strate-
they impose goals clarification exercises gies for attaining goals can be made con-
on program staff. Given the way organiza- crete and stable. The less certain the envi-
tional sociologists have gotten themselves ronment, the less stable and less concrete
tangled up in the question of whether the organization's goals will be. Effective
program-level goals actually exist, it is just organizations in turbulent environments
possible that difficulties in clarifying a adapt their goals to changing demands and
program's goals may be due to problems conditions.
•r.- inherent in the notion of goals rather than
In practical terms, this means that the
staff incompetence, intransigence, or op-
more unstable and turbulent the environ-
position to evaluation. Failure to appreci-
ment of a program, the less likelv it is that
ate these difficulties and proceed with
the evaluator will be able to generate con-
sensitivity and patience can create staff
crete and stable goals. Second, few evalu-
resistance that is detrimental to the entire
ations can investigate and assess all the
evaluation process.
many programmatic components and spe-
I have also hoped that reviewing the cial projects of an agency, organization, or
conceptual and operational problems with program. The clarity, specificity, and mea-
goals would illuminate why utilization- surability of goals will vary throughout a
focused evaluation does not depend on program, depending on the environmental
clear, specific, and measurable objectives as turbulence faced by specific projects and
the sine qua non of evaluation research. program subparts. As an evaluator works
Clarifying goals is neither necessary nor with primary intended users to focus the
appropriate in every evaluation. evaluation, the degree to which it is useful
to labor over writing a goals statement will
Turbulent Environments vary for different parts of the program. It
and Goals will not be efficient or useful to force de-
veloping and adapting programs into a
The extent to which evaluators should static and rigid goals model. Developmen-
seek clarity about goals will depend, among tal evaluation, discussed in Chapter 5, is
other things, on the program's develop- one way of being a useful form of evalu-
mental status and environment. Organiza- ation in innovative settings where goals are
tional sociologists have discovered that the emergent and changing rather than prede-
Focusing an Evaluation • 181

termined and fixed. Another alternative is It seemed to me, in short, that consideration
goal-free evaluation. and evaluation of goals was an unnecessary
but also a possibly contaminating step. I be-
gan work on an alternative approach—
simply the emulation of actual effects against
Goal-Free Evaluation
a profile of demonstrated needs. I call this
Goal-Free Evaluation. . . .
Philosopher-evaluator Michael Scriven
The less the external evaluator hears
has been a strong critic of goals-based
about the goals of the project, the less tunnel
evaluation and, as an alternative, an advo-
vision will develop, the more attention will
cate of what he has called goal-free evalu-
be paid to looking for actual effects (rather
ation. Goal-free evaluation involves gath-
than checking on alleged effects), (p. 2; em-
ering data on a broad array of actual effects
phasis in original)
and evaluating the importance of these ef-
fects in meeting demonstrated needs. The
Scriven (1972b) distrusted the grandi-
evaluator makes a deliberate attempt to
ose goals of most projects. Such great and
avoid all rhetoric related to program goals.
grandiose proposals "assume that a gal-
N o discussion about goals is held with staff,
lant try at Everest will be perceived m o r e
and n o program brochures or proposals are
favorably than successful m o u n t i n g of
read; only the program's actual outcomes
molehills. T h a t may or may not be so, but
and measurable effects are studied, and
it's an unnecessary noise source for the
these are judged on the extent to which
evaluator" (p. 3). H e saw n o reason to get
they meet demonstrated participant needs.
caught u p in distinguishing alleged goals
Scriven (1972b) has offered four rea-
from real goals: " W h y should the evalu-
sons for doing goal-free/needs-based
ator get into the messy job of trying to
disentangle that k n o t ? " H e w o u l d also
avoid goals conflict and goals war: " W h y
1. To avoid the risk of narrowly studying stated try to decide which goal should super-
program objectives and thereby missing im- vene?" H e even countered the goals clari-
portant unanticipated outcomes fication shuffle:
2. To remove the negative connotations at-
tached to the discovery of unanticipated Since almost all projects either fall short of
effects, because "the whole language of their goals or overachieve them, why waste
'side-effect' or 'secondary effect' or even time rating the goals, which usually aren't
'unanticipated effect' tended to be a put- what is achieved? Goal-free evaluation is
down of what might well be the crucial unaffected by—and hence does not legislate
achievement, especially in terms of new against—the shifting of goals midway in a
priorities" (pp. 1-2) project.
3. To eliminate the perceptual biases intro-
duced into an evaluation by knowledge of Scriven (1991b) also dealt with the fuzzi-
goals ness problem: "Goals are often stated so
4. To maintain evaluator objectivity and inde- vaguely as t o cover both desirable and un-
pendence through goal-free conditions desirable activities, by almost anyone's
standards. Why try to find out what was
In Scriven's (1972b) own words: really intended—if anything?" Finally, he

has argued that "if the program is achieving Another error is to think that all standards of
its stated goals and objectives, then these merit are arbitrary or subjective. There's
will show u p " in the goal-free interviews nothing subjective about the claim that we
with and observations of program partici- need a cure for cancer more than a new
pants done to determine actual impacts brand of soap. The fact that some people
(p. 180). have the opposite preference (if true) doesn't
Sometimes the result of goal-free evalu- even weakly undermine the claim about
ation is a statement of goals; that is, rather which of these alternatives the nation needs
than being the initial focus of the evalu- most. So the Goal-Free Evaluation may use
ation process, a statement of operating needs and not goals, or the goals of the
goals becomes its outcome. Scriven, how- consumer or the funding agency. Which of
ever, considers this inappropriate: these is appropriate depends on the case. But
in no case is it proper to use anyone's as the
standard unless they can be shown to be the
It often happens in goal-free evaluation that appropriate ones and morally defensible,
people use this as a way of working out what (pp. 3-4)
the goals are, but I discourage them from
trying to do that. That's not the point of it. As a philosopher, Scriven may feel
The outcome is an assessment of the merit of comfortable specifying w h a t "the nation
the program. needs" and designating standards as "mor-
A better way to put the trouble with the ally defensible." But from a utilization-
name goal-free is to say that you might put focused perspective, this simply begs the
it better by saying it is needs-based instead of question of w h o is served by the informa-
goal-based. It is based on something, namely tion collected. T h e issue is n o t which goals
the needs of the client or recipient, but it isn't are better or worse, moral or i m m o r a l ,
based on the goals of the program people and appropriate or inappropriate, in any o b -
you never need to know those and you jective sense. T h e issue is whose goals will
shouldn't ever look at them. As far as the idea be evaluated. Scriven's goal-free m o d e l
that you finally come up with them as a eliminates only one g r o u p from the game:
conclusion, you'd be surprised the extent to local project staff. H e directs data in only
which you don't. (Scriven and Patton 1976: one clear direction—away from the stated
13-14; emphasis added) concerns of the people w h o r u n the p r o -
gram. H e addresses an external audience,
Some critics of Scriven have countered such as legislative funders. But, inas-
that goal-free evaluation only appears to much as these audiences are ill defined
get rid of goals. T h e only goals really and lack organization, I am unconvinced
eliminated are those of local project staff. that the standards he applies are n o n e
Scriven replaces staff objectives with m o r e other than his very o w n preferences about
global goals based on societal needs a n d what program effects are appropriate a n d
basic standards of morality. Under a goal- morally defensible. Scriven's denial not-
free a p p r o a c h , only the evaluator knows withstanding (cf. Scriven 1 9 7 2 b : 3 ) , goal-
for sure w h a t those needs a n d standards free evaluation carries the danger of sub-
are, although Scriven (1972b) considers stituting the evaluator's goals for those of
such standards to be as obvious as the the project. M a r v Alkin (1972) has m a d e
difference between soap and cancer: essentially the same point:
Focusing an Evaluation • 183

This term "Goal-Free Evaluation" is not to one part of a comprehensive evaluation

be taken literally. The Goal-Free Evaluation includes a goal-free evaluator w o r k i n g
does recognize goals (and not just idiosyn- parallel to a goals-based evaluator. This
cratic ones), but they are to be wider context solves the potential problem that, if
goals rather than the specific objectives of a evaluators need not k n o w w h e r e a p r o -
program. . . . By "goal-free" Scriven simply gram is headed to evaluate w h e r e it ends
means that the evaluator is free to choose a u p , then program staff might embrace this
wide context of goals. By his description, he logic and, likewise, decide t o eschew
implies that a goal-free evaluation is always goals. Under a pure goal-free a p p r o a c h ,
free of the goals of the specific program and program staff need only wait until the
sometimes free of the goals of the program goal-free evaluator determines w h a t the
sponsor. In reality, then, goal-free evaluation p r o g r a m has accomplished and then p r o -
is not really goal-free at all, but is simply claim those accomplishments as their
directed at a different and usually wide deci- original goals. Ken M c l n t y r e (1976) has
sion audience. The typical goal-free eval- described eloquently just such an ap-
uator must surely think (especially if he re- proach to evaluation in a p o e m addressed
jects the goals of the sponsoring agency) that to program staff.
his evaluation will extend at least to the level
of "national policy formulators." The ques- Your program's goals you need a way of
tion is whether this decision audience is of knowing.
the highest priority, (p. 11) You're sure you've just about arrived,
But where have you been going?
It should be n o t e d that Scriven's goal-
So, like the guy who fired his rifle at a
free proposal assumes b o t h internal and
10-foot curtain
external evaluators. T h u s , part of the rea-
And drew a ring around the hole to make a
son the external evaluators can ignore
bull's-eye certain,
p r o g r a m staff and local project goals is
because the internal evaluator takes care It's best to wait until you're through
of all that. T h u s , again, goal-free evalu- And then see where you are:
ation is only partially goal-free. Someone Deciding goals before you start is riskier by
has to stay h o m e and mind the goals while far.
the external evaluators search for any and So, if you follow my advice in your
all effects. As Scriven (1972b) has argued, evaluation,
You'll start with certainty
Planning and production require goals, and And end with self-congratulation, (p. 39)
formulating them in testable terms is abso-
lutely necessary for the manager as well as There have been several serious cri-
the internal evaluator who keeps the man- tiques of goal-free evaluation (see Alkin
ager informed. That has nothing to do with 1972; Kneller 1 9 7 2 ; P o p h a m 1 9 7 2 ;
the question of whether the external evalu- Stufflebeam 1972), much of it focused on
ator needs or should be given any account of the label as much as the substance.
the project's goals, (p. 4) Scriven's critique of goals-based evalu-
ation, however, is useful in affirming w h y
In later reflections, Scriven (1991b: evaluators need more than one way of
181) p r o p o s e d "hybrid forms" in which focusing an evaluation.


Evaluation will not be well served by The utilization-focused evaluation issue is

dividing people into opposing camps: pro- what information is needed by primary
goals versus anti-goals evaluators. I am re- intended users, not whether goals are clear,
minded of an incident at the University of specific, and measurable. Let's consider,
Wisconsin during the student protests over then, some other alternatives to goals-
the Vietnam War. Those opposed to the based evaluation.
war were often labeled communists. At one
demonstration, both anti-war and pro-
war demonstrators got into a scuffle, so A Menu Approach to
police began making arrests indiscrimi- Focusing Evaluations
nately. When one of the pro-war demon-
strators was apprehended, he began yell- Menu 8.1 at the end of this chapter
ing, "You've got the wrong person. I'm offers an extensive list of alternative ways
anti-communistl" To which the police offi- of focusing an evaluation. I'll elaborate on
cer replied, "I don't care what kind of only a few of these here.
communist you are, you're going to jail."
Well, I don't care what kind of evaluator Focusing on future decisions. An evalu-
you are, to be effective you need the flexi- ation can be focused on information
bility to evaluate with or without goals. needed to inform future decisions. Propo-
Focusing an Evaluation • 185

nents and opponents of school busing for 57). The focus, then, is on informing each
desegregation may never agree on educa- group of the perspective of other groups.
tional goals, but they may well agree on
what information is needed to inform fu- Focusing on questions. In Chapter 2,
ture debate, for example, data about who I described focusing an evaluation in Can-
is bused, at what distances, from what ada by having primary intended users
neighborhoods, and with what effects. generate questions that they wanted an-
swered—without regard to methods, mea-
Focusing on critical issues or concerns. surement, design, resources, precision—
When the Minnesota Legislature first in- just 10 basic questions, real questions
itiated Early Childhood Family Education that they considered important.
programs, some legislators were con- After working individually and in small
cerned about what instruction and advice groups, we pulled back together and gen-
were being given to parents. The evalu- erated a single list of 10 basic evaluation
ation focused on this issue, and the evalua- questions—answers to which, they agreed,
tors became the eyes and ears for the could make a real difference to the opera-
Legislature and general public at a time of tions of the school division. The questions
conflict about "family values" and anxiety were phrased in their terms, incorporating
about values indoctrination. The evalu- important local nuances of meaning and
ation, based on descriptions of what actu- circumstance. Most important, they had
ally occurred and data on parent reac- discovered that they had questions they
tions, helped put this issue to rest. Now, cared about—not my questions but their
20 years later, the latest evaluation of this questions, because during the course of the
program (Mueller 1996) has focused on exercise it had become their evaluation.
the issue of universal access. Should the Generating a list of real and meaning-
program be targeted to low-income par- ful evaluation questions played a critical
ents or continue to be available to all part in getting things started. Exhibit 2.4
parents, regardless of income? What are in Chapter 2 offers criteria for good
the effects on parents of a program that utilization-focused questions.
integrates people of different socioeco- It is worth noting that formulating an
nomic backgrounds? And, as before, this appropriate and meaningful question in-
issue has been raised in the Legislature. volves considerable skill and insight. In her
Both these evaluations, then, were issue- novel, The Left Hand of Darkness, science
based more than goals-based, although fiction author Ursula K. Le Guin (1969)
attention to differential parent outcomes reminds us that questions and answers are
was subsumed within the issues. precious resources, not to be squandered or
treated casually. She shows us that how one
The "responsive approach" to evalu- poses a question frames the answer one
ation. Stake (1975) advocates incorporat- gets—and its utility. In the novel, the char-
ing into an evaluation the various points acter Herbor makes an arduous journey to
of view of constituency groups under the fortune tellers who convene rarely and,
assumption that "each of the groups asso- when they do, permit the asking of only a
ciated with a program understands and single question. His mate is obsessed with
experiences it differently and has a valid death, so Herbor asks them how long his
perspective" (Stecherand Davis 1987:56- mate will live. Herbor returns home to tell

his mate the answer, that Herbor will die lar intervals to "kick around" evaluation
before his mate. His mate is enraged: "You ideas. Everyone was free to make sugges-
fool! You had a question of the Foretellers, tions. Said the director, "If the committee
and did not ask them when I am to die, thought a suggestion was worthwhile, we
what day, month, year, how many days are would usually give the person that sug-
left to me—you asked how long? Oh you gested it an opportunity to work it up in a
fool, you staring fool, longer than you, yes, little more detail" [DM159:3]. The pro-
longer than you!" And with that his mate gram officer commented that the final re-
struck him with a great stone and killed port looked systematic and goals-based, but
him, fulfilling the prophecy and driving the
mate into madness, (pp. 45-46)
that's not the kind of thinking we were actu-
ally doing at that time . . . We got started by
A "Seat-of-the-Pants" Approach brainstorming: "Well, we can look at the
funding formula and evaluate it." And some-
In our follow-up study of how federal one said, "Well, we can also see what state
health evaluations were used, we came agencies are doing." See? And it was this kind
across a case example of using issues and of seat-of-the-pants approach. That's the
questions to focus an evaluation. The deci- way we got into it. [P0159:4]
sion makers in that process, for lack of a
better term, called how they designed the The evaluation committee members
evaluation a "seat-of-the-pants" approach. were carefully selected on the basis of
I would call it focusing on critical issues. their knowledge of central program is-
The results influenced major decisions sues. While this was essentially an internal
about the national Hill-Burton Hospital evaluation, the committee also made use
Construction Program. This evaluation il- of outside experts. The director reported
lustrates some key characteristics of utiliza- that the committee was the key to the
tion-focused evaluation. evaluation's use: "I think the makeup of
The evaluation was mandated in federal the committee was such that it helped this
legislation. The director of the national study command quite a lot of attention
Hill-Burton program established a perma- from the state agencies and among the
nent committee on evaluation to make de- federal people concerned" [DM159:18].
cisions about how to spend evaluation Here, then, we have a case example of
funds. The committee included repre- the first two steps in utilization-focused
sentatives from various branches and ser- evaluation: (1) identifying and organizing
vices in the division: people from the state primary intended users of the evaluation
Hill-Burton agencies, from the Compre- and (2) focusing the evaluation on their
hensive Health Planning agencies, from the interests and what they believe will be use-
health care industry, and regional Hill- ful. And how do you keep a group like this
Burton people. The committee met at regu- working together?

Director: Well, I think this was heavily focused toward the major aspects of
the program that the group was concerned about.
Focusing an Evaluation • 187

Interviewer: Did the fact that you focused on major aspects of the program make
a difference in how the study was used?

Decision maker: It made a difference in the interest with which it was viewed by
people. . . . I think if we hadn't done that, if the committee hadn't
been told to go ahead and proceed in that order, and given the
freedom to do that, the committee itself would have lost interest.
The fact that they felt that they were going to be allowed to pretty
well free-wheel and probe into the most important things as they
saw them, I think that had a lot to do with the enthusiasm with
which they approached the task. [DM159:22]

The primary intended users began by were not limited to, goal attainment. They
brainstorming issues ("seat-of-the-pants negotiated back and forth—acting, react-
approach") but eventually framed the ing, adapting—until they determined and
evaluation question in the context of ma- agreed on the most relevant focus for the
jor policy concerns that included, but evaluation.

Changing Focus Over Time: Stage Models of Evaluation

valuate no program until it is proud.

—Donald Campbell (1983)

Important to focusing an evaluation can feedback to staff; (4) the "progress toward
be matching the evaluation to the pro- objectives" tier, focused on immediate,
gram's stage of development, what Tripodi, short-term outcomes and differential effec-
Felin, and Epstein (1971) called differential tiveness among clients; and (5) the "pro-
evaluation. Evaluation priorities can vary gram impact" tier, which focuses on overall
at the initiation stage (when resources are judgments of effectiveness, knowledge
being sought), the contact stage (when the about what works, and model specification
program is just getting under way), and the for replication.
full implementation stage. The logic of these stage models of evalu-
In a similar vein, Jacobs (1988) has con- ation is that, not only do the questions
ceptualized a "five-tier" approach: (1) the evolve as a program develops, but the
preimplementation tier focused on needs stakes go up. When a program begins, all
assessment and design issues; (2) the ac- kinds of things can go wrong, and, as we'll
countability tier to document basic func- see in the next chapter on implementation
tioning to funders; (3) the program clarifi- evaluation, all kinds of things typically do
cation tier focused on improvement and go wrong. It is rare that a program unfolds

as planned. Before committing major re- who were just beginning course develop-
sources to overall effectiveness evaluation, ment (so they were at the initiation or
then, a stage model begins by making sure preimplementation stage, tier one) to ar-
the groundwork was carefully laid during ticulate clear, specific, and measurable
the needs assessment phase; then basic im- goals in behavioral terms. The staff had no
plementation issues are examined and for- previous experience writing behavioral ob-
mative evaluation for improvement be- jectives, nor was program conceptualiza-
comes the focus; if the early results are tion sufficiently advanced to concretize
promising, then and only then, are the goals, so the evaluator formulated the ob-
stakes raised by conducting rigorous jectives for the evaluation.
summative evaluation. It was to this kind To the evaluator, the program seemed
of staging of evaluation that Donald chaotic. How can a program operate if it
Campbell (1983), one of the most distin- doesn't know where it's going? How can it
guished social scientists of the twentieth be evaluated if there are no operational
century, was referring when he implored objectives? His first-year evaluation ren-
that no program should be evaluated be- dered a negative judgment with special em-
fore it is "proud." Only when program staff phasis on what he perceived as the staff's
have reached a point where they and others failure to seriously attend to the behavioral
close to the program believe that they're on objectives he had formulated. The teaching
to something, "something special that we staff reacted by dismissing the evaluation as
know works here and we think others irrelevant. State education officials were
ought to borrow," should rigorous summa- also disappointed because they understood
tive evaluation be done to assess the pro- the problems of first-year programs and
gram's overall merit and worth (Schorr found the evaluation flawed in failing to
1988:269-70). help staff deal with those problems. The
An example may help clarify why it's so program staff refused to work with the
important to take into account a program's same evaluator the second year and faced
stage of development. The Minnesota State the prospect of a new evaluator with suspi-
Department of Education funded a "hu- cion and hostility.
man liberation" course in the Minneapolis When a colleague and I became involved
public schools aimed at enhancing commu- the second year, staff made it clear that they
nication skills around issues of sexism and wanted nothing to do with behavioral ob-
racism. Funding was guaranteed for three jectives. The funders and school officials
years, but a renewal application with evalu- agreed to a formative evaluation with staff
ation findings had to be filed each year. To as primary users. The evaluation focused
ensure rigorous evaluation, an external, on the staff's need for information to in-
out-of-state evaluator was hired. When the form ongoing, adaptive decisions aimed at
evaluator arrived on the scene, virtually program development and improvement.
everything about the program was uncer- This meant confidential interviews with
tain: curriculum content, student reaction, students about strengths and weaknesses of
staffing, funding, relationship to the school the course, observations of classes to de-
system, and parent support. The evaluator scribe interracial dynamics and student re-
insisted on beginning at what Jacobs (198 8) actions, and beginning work on measures
called the fourth of five tiers: assessing of racism and sexism. On this latter
progress toward objectives. He forced staff, point, program staff were undecided as to
Focusing an Evaluation • 189

whether they were really trying to change what will be evaluated means deciding
student attitudes and behaviors or just what will not be evaluated. Programs are so
make students more "aware." They needed complex and have so many levels, goals,
time and feedback to work out satisfactory and functions that there are always more
approaches to the problems of racism and potential study foci than there are re-
sexism. sources to examine them. Moreover, as
By the third year, uncertainties about human beings, we have a limited capacity
student reaction and school system support to take in data and juggle complexities. We
had been reduced by the evaluation. Initial can deal effectively with only so much at
findings indicated support for the program. one time. The alternatives have to be nar-
Staff had become more confident and ex- rowed and decisions made about which
perienced. They decided to focus on instru- way to go. That's why I've emphasized the
ments to measure student changes. They menu metaphor throughout this book. The
were ready to deal with program outcomes utilization-focused evaluation facilitator is
as long as they were viewed as experimen- a chef offering a rich variety of choices,
tal and flexible. from full seven-course feasts to fast-food
The results of the third-year evaluation preparations (but never junk). The stage
showed that students' attitudes became approach to evaluation involves figuring
more racist and sexist because the course out whether, in the life of the program, it's
experience inadvertently reinforced stu- time for breakfast, lunch, a snack, a light
dents' prejudices and stereotypes. Because dinner, or a full banquet.
they helped design and administer the tests This problem of focus is by no means
used, teachers accepted the negative find- unique to program evaluation. Manage-
ings. They abandoned the existing curricu- ment consultants find that a major problem
lum and initiated a whole new approach to for executives is focusing their energies on
dealing with the issues involved. By work- priorities. The trick in meditation is learn-
ing back and forth between specific infor- ing to focus on a single mantra, koan, or
mation needs, contextual goals, and fo- image. Professors have trouble getting
cused evaluation questions, it was possible graduate students to analyze less than the
to conduct an evaluation that was used to whole of human experience in their disser-
improve the program in the second year tations. Time-management specialists find
and make an overall decision about effec- that people have trouble setting and stick-
tiveness at the end of the third year. The ing with priorities in both their work and
key to use was matching the evaluation to personal lives. And evaluators have trouble
the program's stage of development and getting intended users to focus evaluation
the information needs of designated users issues.
as those needs changed over time. Focusing an evaluation means dealing
with several basic concerns. What is the
purpose of the evaluation? How will the
Focusing an Evaluation information be used? What will we know
after the evaluation that we don't know
Focusing an evaluation is an interactive now? What actions will we be able to take
process between evaluators and the pri- based on evaluation findings? These are not
mary intended users of the evaluation. It simply rote questions answered once and
can be a difficult process because deciding then put aside. The utilization-focused

evaluator keeps these questions front and Now, with both arms still outstretched I
center throughout the design process. The want you to focus, with the same intensity
answers to these and related questions will that you've been using on each hand, I want
determine everything else that happens in you to focus on the center of both palms at
the evaluation. As evaluators and primary the same time. (Pause while they try.) Unless
users interact a r o u n d these questions, the you have quite unusual vision, you're not
evaluation takes shape. able to do that. There are some animals who
The challenge is to find those "vital few" can move their eyes independently of each
facts among the "trivial m a n y " that are high other, but humans do not have that capabil-
in payoff and information load (MacKenzie ity. We can look back and forth between the
1972). T h e 20-80 rule expresses the impor- two hands, or we can use peripheral vision
tance of focusing on the right information. and glance at both hands at the same time,
The 20-80 rule states that, in general, 2 0 % but we can't focus intensely on the center of
of the facts account for 8 0 % of what's both palms simultaneously.
worth knowing (Anderson 1980:26). Focusing involves a choice. The decision
In working with intended users to un- to look at something is also a decision not to
derstand the importance of focus, I often look at something. A decision to see some-
d o a short exercise. It goes like this: thing means that something else will not be
seen, at least not with the same acuity. Look-
ing at your left hand or looking at your right
Let me ask you to put your right hand out in hand or looking more generally at both
front of you with your arm fully extended hands provides you with different informa-
and the palm of your hand open. Now, focus tion and different experiences.
on the center of the palm of your hand. The same principle applies to evaluation.
Really look at your hand in a way that you Because of limited time and limited re-
haven't looked at it in a long time. Study the sources, it is never possible to look at every-
lines—some of them long, some short; some thing in great depth. Decisions have to be
of them deep, some shallow; some relatively made about what's worth looking at. Choos-
straight, some nicely curved, and some of ing to look at one area in depth is also a
them quite jagged and crooked. Be aware decision not to look at something else in
of the colors in your hand: reds, yellows, depth. Utilization-focused evaluation sug-
browns, greens, blues, different shades and gests that the criterion for making those
hues. And notice the textures, hills and val- choices of focus be the likely utility of the
leys, rough places and smooth. Become resulting information. Findings that would
aware of the feelings in your hand, feel- be of greatest use for program improvement
ings of warmth or cold, perhaps tingling and decision making focus the evaluation.
Now, keeping your right hand in front of
you, extend your left arm and look at your A Cautionary Note
left palm in the same way, not comparatively, and Conclusion
but just focus on the center of your left palm,
studying it, seeing it, feeling it. . . . Really Making use the focus of evaluation de-
allow your attention to become concentrated cision making enhances the likelihood of,
on the center of your left palm, getting to but does not guarantee, actual use. There
know your left hand in a new way. (Pause.) are no guarantees. All one can really do is
Focusing an Evaluation • 191

increase the likelihood of use. Utilization- in the midst of posing questions—then

focused evaluation is time consuming, fre- evaluation can be exhilarating, energizing,
quently frustrating, and occasionally ex- and fulfilling. The challenges yield to cre-
hausting. The process overflows with ativity, perseverance, and commitment as
options, ambiguities, and uncertainties. those involved engage in that most splendid
When things go wrong, as they often do, of human enterprises—the application of
you may find yourself asking a personal intellect and emotion to the search for an-
evaluation question: How did I ever get swers that will improve human effort and
myself into this craziness? activity. It seems a shame to waste all that
But when things go right; when decision intellect and emotion studying the wrong
makers care; when the evaluation question issues. That's why it's worth taking the time
is important, focused, and on target; when to carefully focus an evaluation for opti-
you begin to see programs changing even mum utility.

c M F N U 8.1
Alternative Ways of Focusing Evaluations

Different types of evaluations ask different questions and focus on different purposes. This
menu is meant to be illustrative of the many alternatives available. These options by no means
exhaust all possibilities. Various options can be and often are used together within the same
evaluation, or options can be implemented in sequence over a period of time, for example,
doing implementation evaluation before doing outcomes evaluation, or formative evaluation
before summative evaluation.

Focus or Type of Evaluation Defining Question or Approach

Accreditation focus Does the program meet minimum standards for accreditation
or licensing?
Causal focus Use rigorous social science methods to determine the relationship
between the program (as a treatment) and resulting outcomes
Cluster evaluation Synthesizing overarching lessons and/or impacts from a number
of projects within a common initiative or framework
Collaborative approach Evaluators and intended users work together on the evaluation
Comparative focus How do two or more programs rank on specific indicators,
outcomes, or criteria?
Compliance focus Are rules and regulations being followed?
Connoisseurship approach Specialists or experts apply their own criteria and judgment, as
with a wine or antiques connoisseur
Context focus What is the environment within which the program operates
politically, socially, economically, culturally, and scientifically?
How does this context affect program effectiveness?
Cost-benefit analysis What is the relationship between program costs and program
outcomes (benefits) expressed in dollars?
Cost-effectiveness analysis What is the relationship between program costs and outcomes
(where outcomes are not measured in dollars)?
Criterion-focused By what criteria (e.g., quality, cost, client satisfaction) shall the
evaluation program be evaluated?
Critical issues focus Critical issues and concerns of primary intended users focus the
Decisions focus What information is needed to inform specific future decisions?
Descriptive focus What happens in the program? (No "why" questions or cause/
effect analyses)
Developmental evaluation The evaluator is part of the program design team, working
together over the long term for ongoing program development
Diversity focus The evaluation gives voice to different perspectives on and
illuminates various experiences with the program. No single
conclusion or summary judgment is considered appropriate.
Effectiveness focus To what extent is the program effective in attaining its goals?
How can the program be more effective?
Efficiency focus Can inputs be reduced and still obtain the same level of output
or can greater output be obtained with no increase in inputs?
Focusing an Evaluation • 193

Effort focus What are the inputs into the program in terms of number of
personnel, staff/client ratios, and other descriptors of levels of
activity and effort in the program?
Empowerment The evaluation is conducted in a way that affirms participants'
evaluation self-determination and political agenda
Equity focus Are participants treated fairly and justly?
Ethnographic focus What is the program's culture?
Evaluability assessment Is the program ready for formal evaluation? What is the feasibility
of various evaluation approaches and methods?
Extensiveness focus To what extent is the program able to deal with the total problem?
How does the present level of services and impacts compare to the
needed level of services and impacts?
External evaluation The evaluation is conducted by specialists outside the program and
independent of it to increase credibility
Formative evaluation How can the program be improved?
Goal-free evaluation What are the actual effects of the program on clients (without
regard to what staff say they want to accomplish) ? To what extent
are real needs being met?
Goals-based focus To what extent have program goals been attained?
Impact focus What are the direct and indirect program impacts, not only on
participants, but also on larger systems and the community?
Implementation focus To what extent was the program implemented as designed? What
issues surfaced during implementation that need attention in the
Inputs focus What resources (money, staff, facilities, technology, etc.) are available
and/or necessary?
Internal evaluation Program employees conduct the evaluation
Intervention-oriented Design the evaluation to support and reinforce the program's desired
evaluation results
Judgment focus Make an overall judgment about the program's merit or worth
(see also summative evaluation)
Knowledge focus What can be learned from this program's experiences and results to
(or Lessons Learned) inform future efforts?
Logical framework Specify goals, purposes, outputs, and activities, and connecting
assumptions; for each, specify indicators and means of verification
Longitudinal focus What happens to the program and to participants over time?
Meta-evaluation Was the evaluation well done? Is it worth using? Did the evaluation
meet professional standards and principles?
Mission focus To what extent is the program or organization achieving its overall
mission? How well do outcomes of departments or programs within
an agency support the overall mission?
Monitoring focus Routine data collected and analyzed routinely on an ongoing basis,
often through a management information system
Needs assessment What do clients need and how can those needs be met?
Needs-based evaluation See Goal-free evaluation


£ MENU 8.1 Continued

Focus or Type of Evaluation Defining Question or Approach

Norm-referenced How does this program population compare to some specific

approach norm or reference group on selected variables?
Outcomes evaluation To what extent are desired client/participant outcomes being
attained? What are the effects of the program on clients or
Participatory evaluation Intended users, usually including program participants and/or
staff, are directly involved in the evaluation
Personnel evaluation How effective are staff in carrying out their assigned tasks and
in accomplishing their assigned or negotiated goals?
Process focus What do participants experience in the program? What are strengths
and weaknesses of day-to-day operations? How can these processes
be improved?
Product evaluation What are the costs, benefits, and market for a specific product?
Quality assurance Are minimum and accepted standards of care being routinely and
systematically provided to patients and clients? How can quality
of care be monitored and demonstrated?
Questions focus What do primary intended users want to know that would make
a difference to what they do? The evaluation answers questions
instead of making judgments
Reputation focus How the program is perceived by key knowledgeables and
influentials; ratings of the quality of universities are often based
on reputation among peers
Responsive evaluation What are the various points of view of different constituency groups
and stakeholders? The responsive evaluator works to capture,
represent, and interpret these varying perspectives under the
assumption each is valid and valuable
Social and community What routine social and economic data should be monitored to
indicators assess the impacts of this program? What is the connection between
program outcomes and larger-scale social indicators, for example,
crime rates?
Social justice focus How effectively does the program address social justice concerns?
Summative evaluation Should the program be continued? If so, at what level? What is the
overall merit and worth of the program?
Theory-driven focus On what theoretical assumptions and model is the program based?
What social scientific theory is the program a test of and to what
extent does the program confirm the theory?
Theory of action What are the linkages and connections between inputs, activities,
approach immediate outcomes, intermediate outcomes, and ultimate impacts?
Utilization-focused What information is needed and wanted by primary intended users
evaluation that will actually be used for program improvement and decision
making? (Utilization-focused evaluation can include any of the other
types above.)
• * & • '

Implementation Evaluation:
What Happened in the Program?

j f your train's on the wrong track, every station you come to is the wrong
—Bernard Malamud

An old story is told that through a series of serendipitous events, much too convoluted
and incredible to sort out here, four passengers found themselves together in a small
plane—a priest; a young, unemployed college dropout; the world's smartest person;
and the President of the United States. At 30,000 feet, the pilot suddenly announced
that the engines had stalled, the plane was crashing, and he was parachuting out. He
added as he jumped, "I advise you to jump too, but I'm afraid there are only three
parachutes left. . . . " With that dire news, he was gone.
The world's smartest person did the fastest thinking, grabbed a parachute, and
jumped. The President of the United States eyed the other two, put on a parachute, and
said as he jumped, "You understand, it's not for myself but for the country."
The priest looked immensely uneasy as he said, "Well, my son, you're young, and
after all I am a priest, and, well, it seems only the right thing to do, I mean, if you want,
um, just, um, go ahead, and um, well. . . . "
The college dropout smiled and handed the priest a parachute. "Not to worry,
Reverend. There's still a parachute for each of us. The world's smartest person grabbed
my backpack when he jumped."


Checking the Inventory finding out what actually is happening in

the program. Of what does the program
Programs, like airplanes, need all their consist? What are the program's key char-
parts to do what they're designed to do and acteristics? Who is participating? What do
accomplish what they're supposed to ac- staff do? What do participants experience?
complish. Programs, like airplanes, are sup- What's working and what's not working?
posed to be properly equipped to carry out What is the program? Menu 9.1 at the end
their assigned functions and guarantee pas- of this chapter provides additional imple-
senger (participant) safety. Programs, like mentation questions. (For a larger menu of
airplanes, are not always so equipped. over 300 implementation evaluation ques-
Regular, systematic evaluations of inven- tions, see King, Morris, and Fitz-Gibbon
tory and maintenance checks help avoid 1987:129-41.)
disasters in both airplanes and programs.
Implementation evaluation focuses on
finding out if the program has all its parts, An Exemplar
if the parts are functional, and if the pro-
gram is operating as it's supposed to be Our follow-up study of federal health
operating. Implementation evaluation can evaluations turned up one quite dramatic
be a major evaluation focus. It involves case of evaluation use with important im-
Implementation Evaluation • 197

plementation lessons. A state legislature When funds were allocated from the
established a program to teach welfare re- state to the city, the program immediately
cipients the basic rudiments of parenting became embroiled in the politics of urban
and household management. Under this welfare. Welfare rights organizations ques-
mandate, the state welfare department was tioned the right of government to tell poor
charged with conducting workshops, dis- people how to spend their money or rear
tributing brochures, showing films, and their children: "You have no right to tell us
training caseworkers on how low-income we have to run our houses like the white
people could better manage their meager middle-class parents. And who's this
resources and become better parents. A Frenchman Piaget who's going to tell us
single major city was selected for pilot- how to raise American kids?"
testing the program, with a respected in- These and other political battles delayed
dependent research institute contracted to program implementation. Procrastination
evaluate the program. Both the state legis- being the better part of valor, no parenting
lature and the state welfare department brochures were ever printed; no household
committed themselves publicly to using the management films were ever shown; no
evaluation findings for decision making. workshops were held; and no caseworkers
were ever hired or trained.
The evaluators interviewed a sample of
In short, the program was never imple-
welfare recipients before the program be-
mented. But it was evaluated! It was found
gan, collecting data about parenting,
to be ineffective—and was killed.
household management, and budgetary
practices. Eighteen months later, they in-
terviewed the same welfare recipients
The Importance of
again. The results showed no measurable
Implementation Analysis
change in parenting or household manage-
ment behavior. The evaluators judged the
It is important to know the extent to
program ineffective, a conclusion they re-
which a program attains intended out-
ported to the state legislature and the news-
comes and meets participant needs, but to
papers. Following legislative debate and
answer those questions it is essential to
adverse publicity, the legislature termi-
know what occurred in the program that
nated funding for the program—a dramatic
can reasonably be connected to outcomes.
case of using evaluation results to inform a The primer How to Assess Program Imple-
major decision. mentation (King et al. 1987) puts it this
Now suppose we want to know why the way:
program was ineffective. The evaluation as
conducted shed no light on what went To consider only questions of program out-
wrong because it focused entirely on mea- comes may limit the usefulness of an eval-
suring the attainment of intended program uation. Suppose the data suggest emphati-
outcomes: changed parenting and house- cally that the program was a success. You can
hold management behaviors of welfare re- say, "It worked!" But unless you have taken
cipients. As it turns out, there is a very care to describe the details of the program's
good reason why the program didn't at- operations, you may be unable to answer a
tain the desired outcomes. It was never question that logically follows such a judg-
implemented. ment of success: "What worked?" If you

cannot answer that, you will have wasted the consonant with that underlying decision.
effort measuring the outcomes of events that More and more, we are finding, the answer
cannot be described and therefore remain a is no.
mystery. . . . It is not just that the programs fall short
If this happens to you, you will not be of the early rhetoric that described them;
alone. As a matter of fact, you will be in good they often barely work at all. . . . Indeed, it
company. Few evaluation reports pay enough is possible that past analysis and research that
attention to describing the processes of a ignored implementation issues may have
program that helped participants achieve its asked the wrong questions, thereby producing
outcomes, (p. 9; emphasis in the original) information of little or no use to policy mak-
ing. (Williams and Elmore 1976:xi-xii; em-
Not knowing enough about implemen- phasis in the original)
tation limits the usefulness of findings
about effective programs and compounds The notion that asking the wrong ques-
misunderstandings about what is often tions will result in useless information is
called "the human services shortfall: the fundamental to utilization-focused evalu-
large and growing gap between what we ation. To avoid gathering useless informa-
expect from government-supported hu- tion about outcomes, it is important to
man service systems and what these sys- frame evaluation questions in the context
tems in fact deliver" (Lynn and Salasin of program implementation. Data on why
1974:4). The human services shortfall is this is critical come from many sources. At
made up of two parts: (1) failure of imple- the international level, studies collected
mented programs to attain desired out- and edited by John C. de Wilde (1967)
comes and (2) failure to actually imple- demonstrated that program implementa-
ment policy in the form of operating tion and administration were the critical
programs. In the early days of evaluation, problems in developing countries. Organ-
evaluators directed most of their atten- izational sociologists have documented
tion to the first problem by conducting the problems that routinely arise in imple-
outcomes evaluations. That practice be- menting programs that are new and inno-
gan to change in the face of evidence that vative alongside or as part of existing pro-
the second problem was equally, if not grams (e.g., Kanter 1983; Corwin 1973;
even more, critical. In a classic study of Hage and Aiken 1970). Researchers
social program implementation, Walter studying diffusion of innovations have
Williams concluded, "The lack of concern thoroughly documented the problems of
for implementation is currently the crucial implementing new ideas in new settings
impediment to improving complex oper- (e.g., Brown 1981; Havelock 1973; Ro-
ating programs, policy analysis, and gers and Shoemaker 1971; Rogers and
experimentation in social policy areas" Svenning 1969). Then there's the marvel-
(Williams and Elmore 1976:267; empha- ous case study of the Oakland Project by
sis in original). Pressman and Wildavsky (1984). Now a
classic on the trials and tribulations of
The fundamental implementation question implementation, this description of a
remains whether or not what has been de- Great Society urban development effort is
cided actually can be carried out in a manner entitled:
Implementation Evaluation • 199

IMPLEMENTATION granted, even after large cash transfers have

How Great Expectations in taken place. Early evaluations of Title I
Washington Are Dashed in Oakland; programs in New York City provide an illus-
Or, Why It's Amazing That tration of this problem. (Guttentag and
Federal Programs Work at All, Struening 1975b:3-4)
This Being a Saga of the Economic
Development Administration as Told Terminating a policy inappropriately is
by Two Sympathetic Observers only one possible error when outcomes
Who Seek to Build Morals on a data are used without data about imple-
Foundation of Ruined Hopes mentation. Expanding a successful pro-
gram inappropriately is also possible
when decision makers lack information
Focus on Utility: Information about the basis for the program's success.
for Action and Decisions In one instance, a number of drug addic-
tion treatment centers in a county were
The problem with pure outcomes evalu- evaluated based on rates of readdiction
ation is that the results give decision makers for treated patients. All had relatively me-
little information to guide action. Simply diocre success rates except one program
learning that outcomes are high or low that reported a 100% success rate over
doesn't tell decision makers much about two years. The county board immediately
what to do. They also need to understand voted to triple the budget of that program.
the nature of the program. In the example Within a year, the readdiction rates for
that opened this chapter, legislators learned that program had fallen to the same me-
that targeted welfare parents showed no diocre level as other centers. By enlarging
behavioral changes, so they terminated the the program, the county board had elimi-
program. The evaluators failed to include nated the key elements in the program's
data on implementation that would have success—its small size and dedicated staff.
revealed the absence of any of the man- It had been a six-patient, halfway house
dated activities that were supposed to bring with one primary counselor who ate,
about the desired changes. By basing their slept, and lived that program. He estab-
decision only on outcomes information, lished such a close relationship with each
the legislators terminated a policy ap- addict that he knew exactly how to keep
proach that had never actually been tried. each one straight. When the program was
This was not a unique case. enlarged, he became administrator of
three houses and lost personal contact
Although it seems too obvious to mention, it with the clients. The successful program
is important to know whether a program became mediocre. A highly effective pro-
actually exists. Federal agencies are often in- gram was lost because the county board
clined to assume that, once a cash transfer acted without understanding the basis for
has taken place from a government agency the program's success.
to a program in the field, a program exists Renowned global investor and philan-
and can be evaluated. Experienced evalu- thropist George Soros tells a similar story.
ation researchers know that the very exis- Through a foundation he established in
tence of a program cannot be taken for Moscow when the Cold War thawed, he

funded a successful program aimed at being tried and to assess gaps in services.
transforming the education system. "I Before "the more sophisticated (and expen-
wanted to make it bigger, so I threw a lot sive) questions about effectiveness" were
of money at it—and in so doing, I destroyed asked, "policymakers wanted to know sim-
it, effectively. It was too much money" pler descriptive information.... If the cur-
(quoted by Buck 1995:76-77). rently funded programs could not even be
If, because of limited time and evalu- described, how could they be improved?"
ation resources, one had to choose between (Bickman 1985:190-91).
implementation evaluation and outcomes Unless one knows that a program is
measurement, there are instances in which operating according to design, there may
implementation assessment would be of be little reason to expect it to produce the
greater value. Decision makers can use im- desired outcomes. Furthermore, until the
plementation monitoring to make sure that program is implemented and a "treatment"
a policy is being put into operation accord- is believed to be in operation, there is lit-
ing to design or to test the very feasibility tle reason to evaluate outcomes. This is
of the policy. another variation on Donald Campbell's
For example, Leonard Bickman (1985) (1983) admonition to evaluate no program
has described a statewide evaluation of until it is proud, by which he meant that
early childhood interventions in Tennessee demanding summative outcomes evalu-
that began by asking stakeholders in state ation should await program claims and sup-
government what they wanted to know. porting evidence that something worth rig-
The evaluators were prepared to undertake orous evaluation is taking place.
impact studies, and they expected out-
comes data to be the evaluation priority.
However, interviews with stakeholders re- Ideal Program Plans and
vealed a surprising sophistication about the Actual Implementation
difficulties and expenses involved in get-
ting good, generalizable outcomes data in Why is implementation so difficult? Part
a timely fashion. Moreover, it was clear of the answer appears to lie with how
that key policymakers and program man- programs are legislated and planned. Poli-
agers "were more concerned about the cymakers seldom seem to analyze the fea-
allocation and distribution of resources sibility of implementing their ideas during
than about the effectiveness of projects" decision making (W Williams 1976:270).
(p. 190). They wanted to know whether This ends up making the task of evaluation
every needy child was being served. What all the more difficult because implementa-
services were being delivered to whom? tion is seldom clearly conceptualized. As a
State agencies could use this kind of imple- result, either as part of evaluability assess-
mentation and service delivery information ment or in early interactions with primary
to "redistribute their resources to unserved intended users, the evaluator will often
areas and populations or encourage differ- have to facilitate discussion of what the
ent types of services" (p. 191). They could program should look like before it can be
also use descriptive information about pro- said to be fully implemented and opera-
grams to increase communications among tional. Criteria for evaluating implementa-
service providers about what ideas were tion will have to be developed.
Implementation Evaluation • 201

Implementation evaluation is further Barriers to Implementation

complicated by the finding that programs
are rarely implemented by single-mindedly Understanding some of the well-docu-
adopting a set of means to achieve prede- mented barriers to implementation can
termined ends. The process simply isn't help evaluators ask appropriate questions
that rational or logical. More common is and generate useful information for pro-
some degree of incremental implementa- gram adaptation and improvement. For
tion in which a program takes shape slowly example, organizational conflict and dis-
and adaptively in response to the emerging equilibrium often increase dramatically
situation and early experiences. For exam- during the implementation stage of organi-
ple, Jerome Murphy (1976:96) found, in zational change. No matter how much
studying implementation of Title V of the planning takes place, "people problems"
Elementary and Secondary Education Act, will arise.
that states exhibited great variation in im-
plementation. He found no basis for the The human element is seldom adequately
widespread assumption that competently considered in the implementation of a new
led bureaucracies would operate like goal- product or service. There will be mistakes
directed, unitary decision makers. Instead, that will have to be corrected. . . . In addi-
implementers at the field level did what tion, as programs take shape power struggles
made sense to them rather then simply develop. The stage of implementation is thus
following mandates from higher up; more- the stage of conflict, especially over power.
over, the processes of implementation were . . . Tempers flare, interpersonal animosities
more political and situational than rational develop, and the power structure is shaken.
and logical. (Hage and Aiken 1970:100, 104)
Sociologists who study formal organiza-
tions, social change, and diffusion of inno- Odiorne (1984:190-94) dissected "the
vations have carefully documented the sub- anatomy of poor performance" in manag-
stantial slippage in organizations between ing change and found gargantuan human
plans and actual operations. Design, imple- obstacles including staff who give up
mentation, and routinization are stages of when they encounter trivial obstacles,
development during which original ideas people who hang onto obsolete ideas and
are changed in the face of what's actually outmoded ways of doing things, emo-
possible (Kanter 1983; Hage and Aiken tional outbursts when asked to perform
1970; Mann and Neff 1961; Smelser new tasks, muddled communications,
1959). Even where planning includes a trial poor anticipation of problems, and de-
period, what gets finally adopted typically layed action when problems arise so that
varies from what was tried out in the pilot once manageable problems become major
effort (Rogers 1962). Social scientists who management crises.
study change and innovation emphasize Meyers (1981:37-39) has argued that
two points: (1) routinization or final accep- much implementation fails because pro-
tance is never certain at the beginning; and gram designs are "counterintuitive"—they
(2) the implementation process always con- just don't make sense. He adds to the litany
tains unknowns that change the ideal so of implementation hurdles the following:
that it looks different when and if it actu- undue haste, compulsion to spend all allot-
ally becomes operational. ted funds by the end of the fiscal year,

personnel turnovers, vague legislation, se- education to be tested in 158 school dis-
vere understaffing, racial tensions, conflicts tricts on 70,000 children throughout the
between different levels of government, nation. The evaluation employed 3,000 peo-
and the divorce of implementation from ple to collect data on program effectiveness.
policy. The evaluation started down the path to
The difference between the ideal, ra- trouble when the designers "simply as-
tional model of program implementation sumed in the evaluation plan that alterna-
and the day-to-day, incrementalist, and tive educational models could and would
conflict-laden realities of program imple- be implemented in some systematic, uni-
mentation is explained without resort to form fashion" (Alkin 1970:2). This as-
jargon in this notice found by Jerome sumption quickly proved fallacious.
Murphy (1976) in the office of a state
education agency: Each sponsor developed a large organiza-
tion, in some instances larger than the entire
federal program staff, to deal with problems
of model implementation. Each local school
The objective of all dedicated system developed a program organization
department employees should consisting of a local director, a team of teach-
be to thoroughly analyze all ers and specialists, and a parent advisory
situations, anticipate all problems group. The more the scale and complexity of
prior to their occurrence,
the program increased, the less plausible it
have answers for these problems,
became for Follow Through administrators
and move swiftly to solve these
problems when called upon. . . . to control the details of program variations,
and the more difficult it became to determine
However . . . whether the array of districts and sponsors
When you are up to your ass in represented "systematic" variations in pro-
alligators, it is difficult to remind gram content. (Williams and Elmore 1976:
yourself that your initial objective 108)
was to drain the swamp, (p. 92)
The Follow Through results revealed
greater variation within models than be-
The Case of Project tween them; that is, the 22 models did not
Follow Through show systematic treatment effects as such.
Most effects were null, some were nega-
Failing to understand that implementa- tive, but "of all our findings, the most
tion of program ideals is neither automatic pervasive, consistent, and suggestive is
nor certain can lead to evaluation disaster, probably this: The effectiveness of each
not only resulting in lack of use, but dis- Follow Through model depended more on
crediting the entire evaluation effort. The local circumstances than on the nature of
national evaluation of Follow Through is a the model" (Anderson 1977:13; emphasis
prime example. Follow Through was intro- in original). In reviewing these findings,
duced as an extension of Head Start for Eugene Tucker (1977) of the U.S. Office
primary-age children. It was a "planned of Education suggested that, in retro-
variation experiment" in compensatory spect, the Follow Through evaluation
education featuring 22 different models of should have begun as a formative effort
Implementation Evaluation • 203

with greater focus on implementation which programs are actually operating as

strategies: desired. Conceptualization of ideals "may
arise from any source, but u n d e r the Dis-
It is safe to say that evaluators did not know crepancy Evaluation M o d e l they are de-
what was implemented in the various sites. rived from the values of the p r o g r a m staff
Without knowing what was implemented, it and the client population it serves"
is virtually impossible to select valid effec- (p. 12). Data to compare actual practices
tiveness measures. . . . Hindsight is a mar- with ideals w o u l d come from local field-
velous teacher and in large-scale experimen- w o r k "of the process assessment t y p e " in
tations an expensive one. (pp. 11-12) which evaluators systematically collect
and weigh data descriptive of ongoing
p r o g r a m activity (p. 13).
Ideals and Discrepancies Given the reality that actual implemen-
tation will typically look different from
Provus (1971:27-29) had warned against original ideas, a primary evaluation chal-
the design used in the Follow Through lenge is to help identified decision makers
evaluation at a 1966 conference on educa- determine how far from the ideal the pro-
tional evaluation of national programs: gram can deviate, and in what ways it can
deviate, while still constituting the original
An evaluation that begins with an experi- idea (as opposed to the original ideal). In
mental design denies to program staff what other words, a central evaluation question
it needs most: information that can be used is: H o w different can an actual program be
to make judgments about the program while from its ideal and still be said to have been
it is in its dynamic stages of growth. . . . implemented? The answer must be clari-
Evaluation must provide administrators and fied between primary intended users and
program staff with the information they evaluators as part of the process of specify-
need and the freedom to act on that infor- ing criteria for assessing implementation.
mation. . . .
We will not use the antiseptic assumptions At some point, there should be a determina-
of the research laboratory to compare chil- tion of the degree to which an innovation has
dren receiving new program assistance with been implemented successfully. What should
those not receiving such aid. We recognize the implemented activity be expected to look
that the comparisons have never been pro- like in terms of the underlying decision? For
ductive, nor have they facilitated corrective a complex treatment package put in different
action. The overwhelming number of evalu- local settings, decision makers usually will
ations conducted in this way show no sig- not expect—or more importantly, not want
nificant differences between "experimental" —a precise reproduction of every detail of
and "control" groups, (pp. 11-12) the package. The objective is performance,
not conformance. To enhance the prob-
Instead, Provus (1971) advocated "dis- ability of achieving the basic program or
crepancy evaluation," an approach that policy objectives, implementation should
compares the actual with the ideal and consist of a realistic development of the un-
places heavy emphasis on implementation derlying decision in terms of the local setting.
evaluation. H e argued that evaluations In the ideal situation, those responsible for
should begin by establishing the degree to implementation would take the basic idea

and modify it to meet special local condi- prehensive studies of educational change
tions. There should be a reasonable ever conducted. The study concluded that
resemblance to the basic idea, as measured implementation "dominates the innovative
by inputs and expected outputs, incorporat- process and its outcomes":
ing the best of the decision and the best of
the local ideas. (Williams and Elmore 1976: In short, where implementation was suc-
277-78) cessful, and where significant change in
participant attitudes, skills, and behavior
The implementation of the Oregon occurred, implementation was character-
Community Corrections Act offers an ex- ized by a process of mutual adaptation in
cellent illustration of how local people which project goals and methods were modi-
can adapt a statewide mandate to fit local fied to suit the needs and interests of the local
needs and initiatives. In studying vari- staff and in which the staff changed to meet
ations in implementation of this legisla- the requirements of the project. This finding
tion, Palumbo, Maynard-Moody, and was true even for highly technological and
Wright (1984) found a direct relationship initially well-specified projects; unless adap-
between higher levels of implementation tations were made in the original plans or
and success in attaining goals. Yet, "the technologies, implementation tended to be
implementation factors that lead to more superficial or symbolic, and significant change
successful outcomes are not things that in participants did not occur. (McLaughlin
can easily be transferred from one locale 1976:169)
to another" (p. 72).
The Change Agent Study found that
the usual emphasis in federal programs
Local Variations on the delivery system is inappropriate.
in Implementing McLaughlin (1976) recommended
National Programs
a shift in change agent policies from a pri-
I would not belabor these points if it mary focus on the delivery system to an
were not so painfully clear that implemen- emphasis on the deliverer. An important les-
tation processes have been ignored so fre- son that can be derived from the Change
quently in evaluations. Edwards et al. Agent Study is that unless the developmental
(1975) lamented that "we have frequently needs of the users are addressed, and unless
encountered the idea that a [national] pro- projects are modified to suit the needs of the
gram is a fixed, unchanging object, observ- user and the institutional setting, the promise
able at various times and places" (p. 142). of new technologies is likely to be unfulfilled,
Because this idea seems so firmly lodged in (p. 180; emphasis in original)
so many minds and spawns so many evalu-
ation designs with reduced utility, I feel The emphasis on the "user" in the Rand
compelled to offer one more piece of evi- study brings us back to the importance
dence to the contrary. of the personal factor and attention to
Rand Corporation, under contract to primary intended users in evaluation of
the U.S. Office of Education, studied 293 implementation processes. Formative,
federal programs supporting educational improvement-oriented evaluations can
change—one of the largest and most com- help users make the kinds of program
Implementation Evaluation • 205

adaptations to local conditions that Rand program is. If relatively inactive, it is un-
found so effective. That is, evaluation can likely to be very effective.
be a powerful tool for guiding program Effort questions include: Have sufficient
development during implementation; it staff been hired with the proper qualifica-
can facilitate initial judgments about the tions? Are staff-client ratios at desired lev-
connections between program activities els? How many clients with what charac-
and outcomes. But implementation evalu- teristics are being served by the program?
ation, like program innovation, must also Are necessary materials available? An effort
be adaptive and focused on users if the evaluation involves making an inventory of
process and results are to be relevant, program operations.
meaningful, and useful. Utilization-focused Tripodi et al. (1971) have linked effort
criteria for evaluating implementation evaluations to stages of program develop-
must be developed through interaction ment. At initiation of a program, evaluation
with primary intended users. Evaluation questions focus on getting services under
facilitators will have to be active-reactive- way. Later, questions concerning the ap-
adaptive in framing evaluation questions propriateness, quantity, and quality of ser-
in the context of program implementation. vices become more important.

Variations and Options in Monitoring Programs: Routine

Implementation Evaluation Management Information

Monitoring has become an evaluation

In working with intended users to focus
specialization (Grant 1978). An important
evaluation questions, several alternative
way of monitoring implementation over
types of implementation evaluation can be
time is to establish a management informa-
considered, many of which can be used in
tion system (MIS). This provides routine
combination. These options deal with dif-
data on client intake, participation levels,
ferent issues. Over time, a comprehen-
program completion rates, caseloads, client
sive evaluation might include all five types
characteristics, and program costs. The
of implementation evaluation reviewed
hardware and software decisions for an
MIS have long-term repercussions, so the
development of such a routine data col-
lection system must be approached with
Effort Evaluation special attention to questions of use and
problems of managing management infor-
Effort evaluations focus on document- mation systems (Patton 1982b). Estab-
ing "the quantity and quality of activity that lishing and using an MIS are often pri-
takes place. This represents an assessment mary responsibilities of internal evaluators.
of input or energy regardless of output. It This has been an important growth area in
is intended to answer the questions 'What the field of evaluation as demands for ac-
did you do?' and 'How well did you countability have increased in human ser-
do it?' " (Suchman 1967:61). Effort evalu- vices (Attkisson et al. 1978; Broskowski,
ation moves up a step from asking if the Driscoll, and Schulberg 1978; Elpers and
program exists to asking how active the Chapman 1978). The "monitoring and

tailoring" approach of Cooley and Bickel setting or settings under study. This means
(1985) demonstrates how an MIS can be unraveling what is actually happening in a
client oriented and utilization focused. program by searching for the major pat-
Problems in implementing an MIS can terns and important nuances that give the
lead to a MIS-match (Dery 1981). While program its character. A process evaluation
there have been no shortage of docu- requires sensitivity to both qualitative and
mented MIS problems and disasters (Lucas quantitative changes in programs through-
1975), computers and data-based manage- out their development; it means becoming
ment information systems have brought intimately acquainted with the details of
high technology and statistical process con- the program. Process evaluations not only
trol to programs of all kinds (Cranford look at formal activities and anticipated
1995; Posavac 1995; Richter 1995). The outcomes, but also investigate informal
trick is to design them to be useful—and patterns and unanticipated consequences
then actually get them used. Utilization- in the full context of program implementa-
focused evaluators can play an important tion and development.
facilitative role in such efforts. Finally, process evaluations usually in-
clude perceptions of people close to the
program about how things are going. A
Process Evaluation variety of perspectives may be sought from
people inside and outside the program. For
Process evaluation focuses on the inter- example, process data for a classroom can
nal dynamics and actual operations of a be collected from students, teachers, par-
program in an attempt to understand its ents, staff specialists, and administrators.
strengths and weaknesses. Process evalu- These differing perspectives can provide
ations ask: What's happening and why? unique insights into program processes as
How do the parts of the program fit to- experienced and understood by different
gether? How do participants experience people.
and perceive the program? This approach A process evaluation can provide useful
takes its name from an emphasis on looking feedback during the developmental phase
at how a product or outcome is produced of a program as well as later, in providing
rather than looking at the product itself; details for diffusion and dissemination of
that is, it is an analysis of the processes an effective program. One evaluator in our
whereby a program produces the results it utilization of federal health evaluations re-
does. Process evaluation is developmental, ported that process information had been
descriptive, continuous, flexible, and in- particularly useful to federal officials in
ductive (Patton 1980a). expanding a program nationwide. Process
Process evaluations search for explana- data from early pilot efforts were used to
tions of the successes, failures, and changes inform the designs of subsequent centers as
in a program. Under field conditions in the the program expanded.
real world, people and unforeseen circum- Process evaluation is one of the four
stances shape programs and modify initial major components of the CIPP (context,
plans in ways that are rarely trivial. The input, process, product) model of evalu-
process evaluator sets out to understand ation developed by Stufflebeam et al.
and document the day-to-day reality of the (1971; Stufflebeam and Guba 1970). It
Implementation Evaluation • 207

involves (1) gathering data to detect or component instead of a program, it is more

predict defects in the procedural design or likely that the component as contrasted to
its implementation during the implementa- entire programs can be generalized to other
tion stages, (2) providing information for sites and other providers. The more homo-
program decision, and (3) establishing a geneous units are, the more likely one can
record of program development as it occurs. generalize from one unit to another. In prin-
ciple, the smaller the unit of analysis within
a hierarchy, the more homogeneous it will
Component Evaluation be. By definition, as programs are composed
of components, programs are more hetero-
The component approach to implemen- geneous than components. It should be
tation involves a formal assessment of dis- easier to generalize from one component to
tinct parts of a program. Programs can be another than to generalize from one pro-
conceptualized as consisting of separate gram to another.
operational efforts that may be the focus of An example of this process might clarify
a self-contained implementation evalu- the point. Any two early childhood programs
ation. For example, the Hazelden Founda- ^may consist of a variety of components im-
tion Chemical Dependency Program typi- plemented in several different ways. Knowl-
cally includes the following components: edge of the success of one program would
detoxification, intake, group treatment, not tell us a great deal about the success of
lectures, individual counseling, release, and the other unless they were structurally simi-
outpatient services. While these compo- lar. However, given the diversity of pro-
nents make up a comprehensive chemical grams, it is unlikely that they would have the
dependency treatment program that can be same type and number of components. In
and is evaluated on the outcome oi contin- contrast, if both had an intake component, it
ued sobriety over time (Laundergan 1 9 8 3 ; would be possible to compare them just on
Patton 1980b), there are important ques- that component. A service provider in one
tions about the operation of any particular part of the state can examine the effective-
component that can be the focus of evalu- ness of a particular component in an other-
ation, either for improvement or to decide wise different program in a different part of
if that component merits continuation. In the state and see its relevance to the program
addition, linkages between one or more he or she was directing, (p. 199)
components may become the focus of
Bickman (1985) has argued that one Treatment Specification
particularly attractive feature of the com-
ponent approach is the potential for greater Treatment specification involves identi-
generalizability of findings and more ap- fying and measuring precisely what it is
propriate cross-program comparisons: about a program that is supposed to have
an effect. It means conceptualizing the pro-
The component approach's major contribu- gram as a carefully defined intervention or
tion to generalizabiiity is its shift from the treatment—or at least finding out if there's
program as the unit of analysis to the com- enough consistency in implementation to
ponent. By reducing the unit of analysis to a permit such a conceptualization. This re-

quires elucidation of the "theory" program takes us into the arena of trying to estab-
staff hold about what they have to do in lish causality.
order to accomplish the results they want.
In technical terms, this means identify- Any new program or project may be thought
ing independent variables that are expected of as representing a theory or hypothesis in
to affect outcomes (the dependent vari- that—to use experimental terminology—the
ables). Treatment specification reveals the decision maker wants to put in place a treat-
causal assumptions undergirding program ment expected to cause certain predicted
activity. effects or outcomes. (Williams and Elmore
Measuring the degree to which concep- 1976:274; emphasis in original)
tualized treatments actually occur can be a
tricky and difficult task laden with meth- From this perspective, one task of imple-
odological and conceptual pitfalls: mentation evaluation is to identify and o p -
erationalize the program treatment.
Social programs are complex undertakings. Some comparative or experimental de-
Social program evaluators look with some- sign evaluations fall into the trap of rely-
thing akin to jealousy at evaluators in ing on the different names programs call
agriculture who evaluate a new strain of themselves—their labels or titles—to dis-
wheat or evaluators in medicine who evalu- tinguish different treatments. Because this
ate the effects of a new drug. . . . The same practice yields data that can easily be mis-
stimulus can be produced again, and other understood and misused, the next section
researchers can study its consequences— explores the problem in greater depth.
under the same or different conditions, with
similar or different subjects, but with some
assurance that they are looking at the effects The Challenge of
of the same thing. Truth-in-Labeling
Social programs are not nearly so specific.
They incorporate a range of components, Warning: This section sermonizes on the
styles, people, and procedures. . . . The con- Pandorian folly attendant upon those w h o
tent of the program, what actually goes on, believe program titles and names. W h a t a
is much harder to describe. There are often program calls its intervention is n o substi-
marked internal variations in operation from tute for gathering actual data on program
day to day and from staff member to staff implementation. Labels are not treatments.
member. When you consider a program as I suspect that overreliance on program
large and amorphous as the poverty program labels is a major source of null findings in
or the model cities program, it takes a major evaluation research. Aggregating results
effort to just describe and analyze the pro- under a label can lead to mixing effective
gram inputs. (Weiss 1972b:43) with ineffective programs that have noth-
ing in c o m m o n except their name. An
Yet, unless basic data are generated evaluation of Residential Community Cor-
about the p r o g r a m as an intervention, the rections Programs in Minnesota offers a
evaluator does not k n o w to what to attrib- case in point. The report, prepared by the
ute the outcomes observed. This is the Evaluation Unit of the Governor's Com-
classic problem of treatment specification mission on Crime Prevention and Control,
in social science research and, of course, compared recidivism rates for three
Implementation Evaluation • 209

"types" of programs: (1) halfway houses, houses varied tremendously in treatment

(2) PORT (Probationed Offenders Reha- modality, clientele, and stage of implemen-
bilitation and Training) projects, and (3) tation. The report's comparisons were
juvenile residences. The term halfway based on averages within the three types of
house referred to a "residential facility de- programs, but the averages disguised im-
signed to facilitate the transition of paroled portant variations within each type. No
adult ex-offenders returning to society "average" project existed, yet, the different
from institutional confinement." This dis- programs of like name were combined for
tinguished halfway houses from juvenile comparative purposes. Within types, the
residences, which served only juveniles. Of- report obscured individual sites that were
fenders on probation were the target of the doing excellent work as well as some of
PORT projects (GCCPC 1976:8). What we dubious quality.
have, then, are three different target One has only to read the journals that
groups, not three different treatments. publish evaluation findings to find similar
The report presented aggregated out- studies. There are comparisons between
come data for each type of community "open" schools and "traditional" schools
corrections program, thereby combining that present no data on relative openness.
the results for projects about which they There are comparisons of individual ther-
had no systematic implementation data. In apy with group therapy where no attention
effect, they compared the outcomes of is paid to the homogeneity of either cate-
three labels: halfway houses, PORT pro- gory of treatment.
jects, and juvenile residences. Nowhere in
the several hundred pages of the report was A common administrative fiction, especially
there any systematic data about the activi- in Washington, is that because some money
ties offered in these programs. People went associated with an administrative label (e.g.,
in and people came out; what happened in Head Start) has been spent at several places
between was ignored by the evaluators. and over a period of time, that the entities
The evaluation concluded that "the evi- spending the money are comparable from
dence presented in this report indicates that time to time and from place to place. Such
residential community corrections pro- assumptions can easily lead to evaluation-
grams have had little, if any, impact on the research disasters. (Edwards et al. 1975:142).
recidivism of program clients" (GCCPC
1976:289). These preliminary findings re-
sulted in a moratorium on funding of new Treatment Specification:
residential community corrections, and the An Alternative to Labeling
final report recommended maintaining
that moratorium. With no attention to the A newspaper cartoon showed several
meaningfulness of their analytical labels, federal bureaucrats assembled around a ta-
and with no treatment specifications, the ble in a conference room. The chair of the
evaluators passed judgment on the effec- group was saying, "Of course the welfare
tiveness of an $11 million program. program has a few obvious flaws . . . but if
The aggregated comparisons were es- we can just think of a catchy enough name
sentially meaningless. When I interviewed for it, it just might work!" (Dunagin 1977).
staff in a few of these community correc- Treatment specification means getting
tions projects, it became clear that halfway behind labels to state what is going to hap-

pen in the program that is expected to make group homes? Do certain types of foster
a difference. For example, one theory un- group homes attain better results, both pro-
dergirding community corrections has viding positive experiences for youth and
been that integration of criminal offenders reducing recidivism?
into local communities is the best way to The findings revealed that the environ-
rehabilitate those offenders and thereby ments of the sample of 50 group homes
reduce recidivism. It is therefore important could be placed along a continuum from
to gather data about the degree to which highly supportive and participatory home
each project actually integrates offenders environments to nonsupportive and authori-
into the community. Halfway houses and tarian ones. Homes were about evenly dis-
juvenile residences can be run like small- tributed along the continua of support ver-
scale prisons, completely isolated from the sus nonsupport and participatory versus
environment. Treatment specification tells authoritarian patterns; that is, about half
us what to look for in each project to find the juveniles experienced homes with
out if the program's causal theory is actu- measurably different climates. Juveniles
ally being put to the test. (At this point we from supportive-participatory group homes
are not dealing with the question of how to showed significantly lower recidivism
measure the relevant independent variables rates than juveniles from nonsupportive-
in a program theory, but only attempting to authoritarian ones (r = .33, p < .01). Vari-
specify the intended treatment in nominal ations in type of group-home environment
terms.) were also correlated significantly with
Here's an example of how treatment other outcome variables (Patton, Guthrie,
specification can be useful. A county Com- etal. 1977).
munity Corrections Department in Minne- In terms of treatment specification,
sota wanted to evaluate its foster group- these data demonstrated two things: (1) in
home program for juvenile offenders. The about half of the county's group homes,
primary information users lacked system- juveniles were not experiencing the kind of
atic data about what the county's foster treatment that the program design called
group homes were actually like. The theory for; and (2) outcomes varied directly with
undergirding the program was that juvenile the nature and degree of program imple-
offenders would be more likely to be reha- mentation. Clearly it would make no sense
bilitated if they were placed in warm, sup- to conceptualize these 50 group homes as
portive, and nonauthoritarian environ- a homogeneous treatment. We found
ments where they were valued by others homes that were run like prisons and
and could therefore learn to value them- homes in which juveniles were physically
selves. The goals of the program included abused. We also found homes where young
helping juveniles feel good about them- offenders were loved and treated as mem-
selves and become capable of exercising bers of the family. Aggregating recidivism
independent judgment, thereby reducing data from all 50 homes into a single average
subsequent criminal actions (recidivism). rate would disguise important environ-
The evaluation measured both out- mental variations. By specifying the desired
comes and implementation with special at- treatment and measuring implementation
tention to treatment environment. What compliance, the program's theory could be
kind of treatment is a youth exposed to in examined in terms of both feasibility and
a group home? What are the variations in effectiveness.
Implementation Evaluation • 211

Format for Connecting Goals With
Implementation Plans and Measurement

Goals: Expected Indicators: Outcome How Goals Will Be Attained Data on

Client Outcomes Data/Measurement Criteria (Implementation Strategies) Implementation Criteria





(For an in-depth discussion of how to evaluators' research interests are secondary

measure treatment environments for dif- to the information needs of primary in-
ferent kinds of programs—mental health tended information users in utilization-fo-
institutions, prisons, family environments, cused evaluation.
military units, classrooms, businesses, schools,
hospitals, and factories—see Conrad and
Roberts-Gray 1988; Moos 1979, 1975, Connecting Goals
1974.) and Implementation
The process of specifying the desired
treatment environment began with identi- In complex programs with multiple
fied evaluation users, not with a scholarly goals, it can be useful to engage staff in an
literature search. The theory tested was exercise that links activities to outcomes
that held by primary decision makers. and specifies measures for each. Exhibit 9.1
Where resources are adequate and the de- offers a matrix to guide this exercise. Once
sign can be managed, the evaluators may completed, the matrix can be used to focus
prevail upon intended users to include tests the evaluation and decide what informa-
of those theories the evaluators believe are tion would be most useful for program
illuminative. But first priority goes to pro- improvement and decision making.
viding intended users with information
about the degree to which their own imple-
Implementation Overview
mentation ideals and treatment specifi-
cations have actually been realized in pro- This chapter has reviewed five evalu-
gram operations. Causal models are some- ation approaches to implementation: (1)
times forced on program staff when they effort evaluation, (2) ongoing program
bear no similarity to the models on which monitoring, (3) process evaluation, (4)
that staff bases its program activities. The component evaluation, and (5) treatment

specification. Depending on the nature of used generically. It's harmful. We ought to

the issues involved and the information stop talking about evaluation as if it's a single
needed, any one, two, or all five approaches homogenous thing. [DM111:29]
might be employed. The point is that with-
out information about actual program op- Implementation is one possible focus
erations, decision makers are limited in for an evaluation. Not all designs will
interpreting performance data for program include a lot of implementation data.
improvement. These different evaluations Other information may be more impor-
answer different questions and focus on tant, relevant, and useful to inform pend-
different aspects of program implementa- ing decisions. What is crucial is that dur-
tion. The key is to match the type(s) of ing the process of framing the evaluation,
evaluation to the information needs of spe- the issue of implementation analysis is
cific stakeholders and primary intended us- raised. Evaluators have a responsibility in
ers. One of the decision makers we inter- their active-reactive-adaptive interactions
viewed in our utilization study was with stakeholders to explore options with
emphatic on this point: intended users to decide jointly what will
be useful in the particular circumstances
Different types of evaluations are appropri- at hand.
ate and useful at different times.... We tend Sometimes what primary users need and
to talk about evaluation as if it's a single want varies from the evaluator's initial
thing. The word evaluation should not be expectations.

former Ambassador to China Winston Lord was once driving in the Chinese
countryside with his wife. They stopped at an ancient Buddhist temple, where the senior
ntnnk greeted them enthusiastically. "Would you do this temple a great honor and favor
for our future visitors, to guide and instruct them? Would you write something for us
in English?"
Ambassador Lord felt quite flattered because he knew that, traditionally, only
emperors and great poets were invited to write for the temple. The monk n turned ihnrliy
carrying two wooden plaques and said: "To guide and instruct future I nglish visitors,
would you write on this plaque the word 'Ladies' and on this phufue the word

May the writings of evaluators be as useful.

Implementation Evaluation • 213

( MENU 9.1 ^

Sample Implementation Evaluation Questions

Feasibility and Compliance Issues

1. What was originally proposed and intended for implementation?
2. What needs assessment or situation analysis informed program design?
3. What was the program's expected model?
4. What theory and assumptions undergirded the proposed model, if any?
5. Who has a stake in the program being implemented as proposed and originally
6. What resources were anticipated for full implementation?
7. What staff competencies and roles were anticipated?
8. What were the original intended time lines for implementation?
9. What aspects of implementation, if any, involve meeting legal mandates?
10. What potential threats to implementation were anticipated during design?

Formative Evaluation Questions

1. What are the program's key characteristics as perceived by various stakeholders,
for example, participants, staff, administrators, funders? How similar or different
are those perceptions? What's the basis of differences?
2. What are the characteristics of program participants and how do those compare
to the intended target population for the program?
3. How do actual resources, staff competencies and experiences, and time lines
compare to what was expected?
4. What's working as expected? What's not working as expected? What challenges
and barriers have emerged? How has staff responded to those challenges and
5. What assumptions have proved true? What assumptions are problematic?
6. What do participants actually do in the program? What are their primary activities
(in detail)? What do they experience?
7. What do participants like and dislike? What are their perceptions of what's
working and not working? Do they know what they're supposed to accomplish
as participants? Do they "buy into" the program's goals and intended outcomes?


I M l ' N U 9.1 Continued \

8. How well are staff functioning together? What are their perceptions about
what's working and not working? Do they know what outcomes they're aiming
for? Do they "buy into" the program's goals and intended outcomes? What are
their perceptions of participants? of administrators? of their own roles and
9. What has changed from the original design and why? On what basis are adapta-
tions from the original design being made? Who needs to "approve" such changes?
10. What monitoring system has been established to assess implementation on an
ongoing basis and how is it being used?

Summative Implementation Questions

1. As the program has been implemented, what model has emerged? That is, can the
program be modeled as an intervention or treatment with clear connections
between inputs, activities, and outcomes?
2. To what extent and in what ways was the original implementation design feasible?
What was not feasible? Why? Were deviations from the original design great
enough that what was actually implemented constitutes a different model, treat-
ment, or intervention from what was originally proposed? In other words, has the
feasibility and viability of the original design actually been tested in practice, or
was something else implemented?
3. How stable and standardized has the implementation become both over time and,
if applicable, across different sites?
4. To what extent is the program amenable to implementation elsewhere? What
aspects of implementation were likely situational? What aspects are likely gener-
5. What are the start-up and continuing costs of implementation?
6. Has implementation proved sufficiently effective and consistent that the program
merits continuation?

Lessons Learned Implementation Questions

1. What has been learned about implementation of this specific program that might
inform similar efforts elsewhere?
2. What has been learned about implementation in general that would contribute to
scholarly and policy research on implementation?
V )
NOTE: For a larger menu of over 300 implementation evaluation questions, see King et al. 1987:129-41.
The Program's Theory of Action
Conceptualizing Causal Linkages

All the World's a Stage for Theory

In Tony Kushner's Pulitzer Prize-winning play, Angels in America, Part Two opens in the
Hall of Deputies, the Kremlin, where Aleksii Antedilluvianovich Prelapsarianov, the World's
Oldest Living Bolshevik, speaks with sudden, violent passion, grieving a world without

How arc we to proceed without Theory? What System of Thought have these
Reformers to present to this mad swirling planetary disorganization, to the Inevident
Waller of fact, event, phenomenon, calamity? Do they have, as we did, a beautiful
'theory, as hold, as Grand, as comprehensive a construct. .. ? You can't imagine, when
i<r first read the Classic Texts, when in the dark vexed night of our ignorance and terror
the seed-words sprouted and shoved incomprehension aside, when the incredible bloody
vegetable struggled up and through into Red Blooming gave us Praxis, True Praxis, True
Theory married to Actual Life. . . . You who live in this Sour Little Age cannot imagine
the grandeur of the prospect we gazed upon: like standing atop the highest peak in the
mighty Caucasus, and viewing in one all-knowing glance the mountainous, granite order
of creation. You cannot imagine it. I weep for you.
And what have you to offer now, children of this Theory? What have you to offer in
its place? Market Incentives? American Cheeseburgers? Watered-down Bukharinite
stopgap makeshift Capitalism! NEPmen! Pygmy children of a gigantic race!
Change? Yes, we must change, only show me the Theory, and I will be at the
barricades, show me the book of the next Beautiful Theory, and I promise you these
blind eyes will see again, just to read it, to devour that text. Show me the words that
will reorder the world, or else keep silent.
—Kushner i y y 4 : l . M 4 '


Mountaintop Inferences

I hat evil is half-cured whose cause we know.

^ -Shakespeare

Causal inferences flash as lightning bolts in stormy controversies. While philosophers of

science serve as meteorologists for such storms—describing, categorizing, predicting, and
warning, policymakers seek to navigate away from the storms to safe harbors of reasonable-
ness. When studying causality as a graduate student, I marveled at the multitude of
mathematical and logical proofs necessary to demonstrate that the world is a complex place
(e.g., Nagel 1961; Bunge 1959). In lieu of rhetoric on the topic, I offer a simple Sufi story
to introduce this chapter's discussion of the relationship between means and ends, informed
and undergirded by theory.

The incomparable Mulla Nasrudin was visited by a would-be disciple. The man, after
many vicissitudes, arrived at the hut on the mountain where the Mulla (teacher) was
sitting. Knowing that every single action of the illuminated Sufi was meaningful, the
rwucnnici asked Nasrudin why he was blowing on his hands. "To warm myself in the
cold, of course," Nasrudin replied.
Shortly afterward, Nasrudin poured out two bowls of soup, and blew on his own.
"Why are you doing that, Master?" asked the disciple. "To cool it, of course," said the
At that point, the disciple left Nasrudin, unable to trust any longer a man who used
the same process to cause different effects—heat and cold.
—Adapted from Shah 1964: 79-80

Reflections on Causality in Evaluation

In some cases, different programs use Stated quite simply, the causal question
divergent processes to arrive at the same in evaluation is this: Did the implemented
outcome; in others, various programs use program lead to the desired outcomes?
similar means to achieve different out- However, in the previous chapters, it has
comes. Sometimes, competing treatments become clear that delineating either pro-
with the same goal operate side by side in gram implementation or outcomes can lead
a single program. Sorting out causal link- us into conceptual and empirical labyrinths
ages challenges both theoretically and unto themselves. Now we must consider
methodologically. how to find openings where they connect
The Program's Theory of Action • 217

to each other. To what extent and in what about causality. Our aim is more modest:
ways do the processes, activities, and treat- reasonable estimations of the likelihood
ments of a program cause or affect the that particular activities have contributed
behaviors, attitudes, skills, knowledge, and in concrete ways to observed effects—
feelings of targeted participants? Such emphasis on the word reasonable. Not de-
questions are complex enough in small, finitive conclusions. Not absolute proof.
local programs, but imagine for a moment Evaluation offers reasonable estimations of
the complexity of attributing effects to probabilities and likelihood, enough to
causes in evaluating an entire multilayered, provide useful guidance in an uncertain
multisite initiative to integrate human ser- world (Blalock 1964). Policymakers and
vices (Knapp 1996:25-26; Marquart and program decision makers, I find, typically
Konrad 1996). understand and appreciate this. Hard-core
One need know little about research to academics and scientists often don't. As
appreciate the elusiveness of definitive, always, the question of primary intended
pound-your-fist-on-the-table conclusions users is . . . primary.

The Theory Option in Evaluation:

Constructing a Means-Ends Hierarchy

o ausation. The relation between mosquitos and mosquito bites.

—Michael Scriven (1991b:77)

To venture into the arena of causality is and therefore must be accomplished before
to undertake the task of theory construc- higher-level goals (long-term impacts). Any
tion. This chapter suggests some simple given objective in the chain is the outcome
conceptual approaches to theory construc- of the successful attainment of the preced-
tion aimed at elucidating and testing the ing objective and, in turn, is a precondition
theory upon which a program is based. A to attainment of the next higher objective.
theory links means and ends. The construc-
tion of a means-ends hierarchy for a pro- Immediate goals refer to the results of the
gram constitutes a comprehensive descrip- specific act with which one is momentarily
tion of the program's model. For example, concerned, such as the formation of an obe-
Suchman (1967) recommended building a sity club; the intermediate goals push ahead
chain of objectives by trichotomizing ob- toward the accomplishment of the specific
jectives into immediate, intermediate, and act, such as the actual reduction in weight of
ultimate goals. The linkages between these club members; the ultimate goal then exam-
levels make up a continuous series of ac- ines the effect of achieving the intermediate
tions wherein immediate objectives (focused goal upon the health status of the members,
on implementation) logically precede in- such as reduction in the incidence of heart
termediate goals (short-term outcomes) disease. (Suchman 1967:51-52)

The means-ends hierarchy for a pro- (Perrow 1968:307). In utilization-focused

gram often has many more than three links. evaluation, the decision about where to
In Chapter 7, I presented the mission enter the means-ends hierarchy for a par-
statement, goals, and objectives of the ticular evaluation is made on the basis of
Minnesota Comprehensive Epilepsy Pro- what information would be most useful to
gram. This three-tier division—mission, the primary intended evaluation users. In
goals, and objectives—was useful to get an other words, a formative evaluation might
overview of the program as an initial step focus on the connection between inputs
in identifying what evaluation informa- and activities (an implementation evalu-
tion might be most useful. Once that in- ation) and not devote resources to measur-
itial focus was determined, a more de- ing outcomes higher up in the hierarchy
tailed, multitiered chain of objectives until implementation was ensured. Eluci-
could be constructed. For example, the dating the entire hierarchy does not incur
epilepsy program had educational, re- an obligation to evaluate every linkage in
search, treatment, and administrative the hierarchy. The means-ends hierarchy
goals. Once the research goal was selected displays a series of choices for more fo-
by decision makers as the evaluation pri- cused evaluations while also establishing a
ority, a more thorough means-ends hier- context for such narrow efforts.
archy was constructed. Exhibit 10.1 illus- Suchman (1967:55) used the example of
trates the difference between the initial a health education campaign to show how
three-tier conceptualization and the more a means-ends hierarchy can be stated in
refined multitier chain of objectives devel- terms of a series of measures or evaluation
oped later. To have constructed such a findings. Rather than linking a series of
detailed, multitier chain of objectives for objectives, in Exhibit 10.2, he displayed the
all seven epilepsy goals would have taken theoretical hierarchy as a series of evalu-
a great deal of time and effort. By using ative measurements.
the simple, three-tier approach initially, it How theory-driven an evaluation should
was possible to then focus on those goal be is a matter of debate, as is the question
areas in which conceptualizing a full chain of what sources to draw on in theory con-
of objectives (or means-ends hierarchy) struction (Bickman 1990; Chen and Rossi
was worth the time and effort. 1989). Evaluators who gather purely de-
The full chain of objectives that links scriptive data about implementation or
inputs to activities, activities to immediate outcomes without connecting the two in
outputs, immediate outputs to intermedi- some framework risk being attacked as
ate outcomes, and intermediate outcomes atheoretical technicians. Yet, a program
to ultimate goals constitutes a program's must have achieved a certain level of ma-
theory. Any particular paired linkage in the turity to make the added effort involved in
theory displays an action and reaction: a theory-driven evaluation fruitful. At times,
hypothesized cause and effect. As one con- all decision makers need and want is de-
structs a hierarchical/sequential model, it scriptive data for monitoring, fine-tuning,
becomes clear that there is only a relative or improving program operations. How-
distinction between ends and means: "Any ever, attention to program theory can yield
end or goal can be seen as a means to important insights and, in recent years,
another goal, [and] one is free to enter the thanks especially to Chen's (1990) advo-
'hierarchy of means and ends' at any point" cacy of theory-driven evaluation and the
The Program's Theory of Action • 219

Initial and Refined Epilepsy Program Means-Ends Theory

Initial Conceptualization of Epilepsy Program

Program Mission: To improve the lives of people with epilepsy through research
Program Goal: To publish high-quality, scholarly research on epilepsy
Program Objective: To conduct research on neurological, pharmacological, epidemiological,
and social psychological aspects of epilepsy

Refined Conceptualization of Epilepsy Chain of Objectives

1. People with epilepsy lead healthy, productive lives

2. Provide better medical treatment for people with epilepsy
3, Increase physicians' knowledge of better medical treatment for epileptics
4. Disseminate findings to medical practitioners
5. Publish findings in scholarly journals
6. Produce high-quality research findings on epilepsy
7. Establish a program of high-quality research on epilepsy
8. Assemble necessary resources (personnel, finances, facilities) to establish a research
Identify and generate research designs to close knowledge gaps
10. Identify major gaps in knowledge concerning causes and treatment of epilepsy

w o r k of Connell et al. (1995) on using the- 2. The inductive approach—doing fieldwork

ories of change to frame evaluations of on a program to generate grounded theory
community initiatives, evaluators have 3. The user-focused approach—working with
been challenged to take a more active role intended users to extract and specify their
in looking for opportunities to design implicit theory of action
evaluations on a solid foundation of theory.
The deductive approach draws on
dominant theoretical traditions in specific
Three Approaches t o scholarly disciplines to construct models
Program T h e o r y of the relationship between p r o g r a m
treatments and outcomes. For example,
Three major approaches to program an evaluation of whether a graduate
theory development for evaluation are: school teaches students to think critically
could be based on the theoretical perspec-
1. The deductive approach—drawing on schol- tive of a phenomenography of adult criti-
arly theories from the academic literature cal reflection, as articulated by the Distin-

Theoretical Hierarchy of Evaluation Measures
for a Health Education Campaign

Reduction in morbidity and mortality

Proportion of people in the target population who meet

prescribed standards of behavior
Number whose behaviors change

Number whose opinions change


Number who learn the facts

Number who read it

Number of people who receive the literature

Amount of literature distributed

Number of pieces of literature available for distribution

Pretest literature by readability criteria

SOURCE: Adapted from Suchman 1967:55.

guished Professor of Education Stephen than evaluation, that is, to let the litera-
Brookfield (1994), an approach that em- ture review and theory testing take over
phasizes the visceral and emotional di- the evaluation. Testing social science
mensions of critical thought as opposed to theories may be a by-product of an evalu-
purely intellectual, cognitive, and skills ation in which the primary purpose is
emphases. Illustrations of the deductive knowledge generation (see Chapter 4),
approach to evaluation are chronicled in but the primary focus in this chapter is on
Rossi and Freeman (1993) and Boruch, testing practitioner theories about why
McSweeny, and Soderstrom (1978). How- they do what they do and what they think
ever, the temptation in the deductive ap- results from what they do. Utilization-
proach is to make the study more research focused evaluation involves primary in-
The Program's Theory of Action • 221

tended users in specifying the program's fieldwork rather than from discussion and
theory and in deciding how much atten- group facilitation with those involved.
tion to give to testing the theory gener- What makes the user-focused approach
ated, including how much to draw on challenging is that practitioners are seldom
social science theory as a framework for aware of their theory of action. The notion
the evaluation (Patton 1989). that people in programs operate on the
The inductive approach involves the basis of theories of action derives from the
evaluator in doing fieldwork to generate work of organizational development schol-
theory. Staying with the example of evalu- ars Chris Argyris and Donald Schdn (1978,
ating whether graduate students learn to 1974). They studied the connection be-
think critically, the inductive approach tween theory and practice as a means of
would involve assessing student work, ob- increasing professional effectiveness:
serving students in class, and interviewing
students and professors to determine what
model of education undergirds efforts to We begin with the proposition that people
impart critical thinking skills. Such an ef- hold theories of action about how to produce
fort could be done as a study unto itself, for consequences they intend. Such theories are
example, as part of an early evaluability theories about human effectiveness. By effec-
assessment process, or it could be done in tiveness we mean the degree to which people
conjunction with a deductive effort based produce their intended consequences in
on a literature review. The product of the ways that make it likely that they will con-
inductive approach, and therefore a major tinue to produce intended consequences.
product of the evaluation, would be an Theories of action, therefore, are theories
empirically derived theoretical model of about effectiveness, and because they con-
the relationship between program activities tain propositions that are falsifiable, they are
and outcomes framed in terms of important also theories about truth. Truth in this case
contextual factors. means truth about how to behave effectively.
(Argyris 1982:83)

The phrase theories of action refers spe-

User-Focused Theory cifically to how to produce desired results
of Action Approach in contrast to theories in general, which
explain why some phenomenon of inter-
In the user-focused approach, the evalu- est occurs. Deductive and inductive ap-
ator's task is to facilitate intended users, proaches to theory make use of programs
including program personnel, in articulat- as manifestations of some larger phe-
ing their operating theory. Continuing with nomenon of interest while theories of ac-
the critical thinking example, this would tion are quite specific to a particular pro-
mean bringing together students and pro- gram or organization. Argyris and Schon
fessors to make explicit their educational (1978) distinguish two kinds of theories
assumptions and generate a model that of action: (1) espoused theories—what
could then be tested as part of the evalu- people say or believe is their theory; and
ation. In the purely inductive approach (2) theories-in-use—the bases on which
above, by way of contrast, the evaluator people actually act. They drew on a great
builds the theory from observations and body of research showing the following:

People do not always behave congruently and treatment specifications actually

with their beliefs, values, and attitudes (all achieve desired o u t c o m e s t h r o u g h p r o -
part of espoused theories). . . . Although gram operations. The evaluator's o w n
people do not behave congruently with their theories and academic traditions can be
espoused theories, they do behave con- helpful in discovering and clarifying the
gruently with their theories-in-use, and they p r o g r a m ' s theories of action, but testing
are unaware of this fact. (Argyris 1982:85) intended users' and decision makers'
theories of p r o g r a m m a t i c action is pri-
In this conundrum of dissonance between mary; the evaluator's scholarly interests
belief and practice lies a golden oppor- are secondary.
tunity for reality testing: the heart of The importance of understanding the
evaluation. program's theory of action as perceived by
The user-focused theory of action ap- key stakeholders is explained in part by
proach can involve quite a bit of work, basic insights from the sociology of knowl-
since few front-line practitioners in pro- edge and work on the social construction
grams are schooled to think systematically of reality (Holzner and M a r x 1979; Berger
in terms of theoretical constructs and rela- and Luckman 1967; Schutz 1967). This
tionships. Moreover, the idea of making work is built on the observation of W I.
their assumptions explicit and then testing Thomas that what is perceived as real is real
them can be frightening. The user-focused in its consequences. In this case, espoused
evaluator, as facilitator of this process, theories are what practitioners perceive to
must do at least five things: be real. Those espoused theories, often im-
plicit and only espoused when asked for,
1. Make the process of theory articulation have real consequences for what practition-
understandable. ers do. Elucidating the theory of action
2. Help participants be comfortable with the held by primary users can help t h e m be
process intellectually and emotionally. more deliberative about what they d o and
3. Provide direction for how to articulate es- more willing to put their beliefs and as-
poused theories that participants believe un- sumptions to an empirical test through
dergird their actions. evaluation. In short, the user-focused ap-
4. Facilitate a commitment to test espoused the- proach challenges decision makers, pro-
ories in the awareness that actual theories- gram staff, funders, and other users to en-
in-use, as they emerge, may be substantially gage in reality testing, that is, to test
different from espoused theories. whether what they believe to be true (their
5. Keep the focus on doing all this to make the espoused theory of action) is what actually
evaluation useful. occurs (theory-in-use).

T h e causal model t o be tested in the

user-focused evaluation is the causal A Reality-Testing Example
model u p o n which program activities are
based, not a model extracted from aca- Let me offer a simple example of user-
demic sources or fieldwork. First priority focused, theory-of-action reality testing. A
goes to providing primary stakeholders State Department of Energy allocated con-
with information about the degree to servation funds through 10 regional dis-
which their own implementation ideals tricts. An evaluation was commissioned by
The Program's Theory of Action • 223

the department to assess the impact of local that the reality (theory-in-use) in four dis-
involvement in priority setting. State and tricts did not match the espoused theory
regional officials articulated the following in ways that had significant consequences
equitable and uniform model of decision for all concerned.
making as their espoused theory of action: This is a simple, commonsense example
of a user-focused approach to articulating
1. State officials establish funding targets for and testing a program's theory of action.
each district based on needs assessments and Nothing elegant. No academic trappings.
available funds. The espoused theory of action is a straight-
2. District advisory groups develop proposals forward articulation of what is supposed to
to meet the state targets with broad citizen happen in the process that is intended to
input. achieve desired outcomes. The linkages be-
3. The State approves the budgets based on the tween processes and outcomes are made
merit of the proposals within the guidelines, explicit. Evaluative data then reveal the
rules, and targets provided. theory-in-use, that is, what actually hap-
4. Expected result: Approved funds equal pens. Program staff, other intended users,
original targets. and evaluators can learn a great deal from
engaging in this collaborative process (e.g.,
In short, the espoused theory of action Layzer 1996).
was that decisions are made equitably
based on explicit procedures, guidelines,
and rules. The data showed this to be the A Menu of Theory-
case in 6 of the 10 districts. In the other 4 Based Approaches
districts, however, proposals from the dis-
tricts exceeded the assigned target Each of the three approaches to pro-
amounts by 30% to 55%; that is, a district gram theory—deductive, inductive, and
assigned a target of $100 million submit- user-focused—has advantages and disad-
ted proposals for $140 million (despite a vantages. These are reviewed in Menu
"rule" that said proposals could not ex- 10.1, drawing on the work of Lipsey
ceed targets). Moreover, the final, ap- and Pollard (1989) and Chen (1989). The
proved budgets exceeded the original tar- strategic calculations a utilization-focused
gets by 20% to 40%. The district with a evaluator must make include determining
target of $100 million and proposals for how useful it will be to spend time and
$140 million received $120 million. Four effort elucidating a theory of action (or
of the districts, then, were not engaged in more than one where different perspectives
a by-the-book equitable process; rather, exist); how to keep theory generation from
their process was negotiated, personal, becoming esoteric and overly academic;
and political. Needless to say, when these how formal to be in the process; and what
data were presented, the six districts that combinations of the three approaches, or
followed the guidelines and played the relative emphasis, should be attempted.
funding game by what they thought were Factors to consider in making these calcu-
uniform rules—the districts whose pro- lations will be clearer after some more ex-
posals equaled their assigned targets— amples, which follow, but the focus here
were outraged. Testing the espoused the- is on the user-focused, theory-of-action
ory of uniformity and fairness revealed approach.
JJ -o
b0_ o
EjH bo ui
a bo
u a c 2 £ tl
E d
cd ^ u O -3 ^. o o -3
o,S ti to uT <J « t> Ui
d ; <*3 " S£ u, oS
a, o -°
u O JO U
d ti C Mg
o v o c E o o jd -a " d . 5 bO
u, t>
« j* j y Jd
bO oi E on d

« a o « •S d CB
t) «

^ S '5 3 ^ u
•S p O S | bC Z.S
3 t! S u " d -3 o
c « « a, « « u

o bb « .2 3 a. _u
u -g o 3
-| S g g JStS y 2^
u, " o
ui 'G SI
u a. u
O v % bo j-' _d Jj - a d
-- S £ .9 "d o ij *c-£3 o d J d ^ u d Jd <-2
o bO O bO u
g 8 « £T ° U 2 O - t)
Q o Q 'a, Q jd 3
o « > Q-SJS Q g-S


d O d
Ul i-*-l a ui o
c< t>
•- 6 J iiu
ty be

6 fZ Ui

S bo
"C ™O ^ . T133
Q 'V
" nj "O ud 3 a,
2 E 0 3 e-S o
EI 5 S
ti >-
a • "S -E
theory, the
ers struggl

refl t program
al explicit

^ c te 2 tl
2 .2 •3 r1 Js I? u
.S S. J3
-S &
3 •- E u
'C a , °i
"> 2 u tl u, J»j O v G d g
£ 1 . - t)
i-3 a ,
> , bfl
« o
to V) -a

o o 2 o bO

"S g& 3 -a
"« _3 u


I 1
3 — 13
C3 -3 > 13 .2
5 -a
-a o 3 C
&> 3 £
o t> tl u U
d d -a JD rt t£
Ui o Ui o d SE-S 5 t d
U *3
U 3 d
« vb d
O 13 E ~
o S-T3 M
3 nj 3 n]
ti O
i n
<u O
u, —-•
U u
O Ui
o (J
ho ca 4J
-a u>,> T3 >,
>< "- rt - 3 Wi
2ca C
R a
V o
d Ui
tl O Ui "Si u > d u
nj Ul
•fc* Ul OJ <u O ' gfl r- u, V c
t> .H
d jd d -d tl
^ Q
H l-H jd
• >H
bO rt
4_| 1-3
UJ a
O 2 'L "a,
x ti ti
bo bO > .
d t> .ti O O
ui b 8§
1"8J J*SE ^
o d, u jd

a , <y <-& o v •S d - o S^.a

w o CQ . G 4J
« ° Ji 2 2 E
" g - S &*•
JJH S >5 c
Jd 3 s> d
2" ^ ° g o «
M Ji u,
o 2
iS c « u 2 M u d -5
L <
<U u. u,
•« o 3
« o.2
tl ui
tj bo2
3 c u
•a -S d
•*>! E
« 2 o
C O u
i-3 -O bo

The Program's Theory of Action • 225

Getting at Assumptions program staff and decision makers empiri-

and Causal Connections cally test their causal hypotheses than by
telling them such causal hypotheses are
Identifying Critical nonsense. Not only does the wheel have to
Validity Assumptions be re-created from time to time, its efficacy
has to be restudied and reevaluated to dem-
The purpose of thoroughly delineating onstrate its usefulness. Likewise, the evalu-
a program's theory of action is to assist ator's certain belief that square wheels are
practitioners in making explicit their as- less efficacious than round ones may have
sumptions about the linkages between in- little impact on those who believe that
puts, activities, immediate outputs, inter- square wheels are effective. The utilization-
mediate outcomes, and ultimate goals. focused evaluator's task is to delineate the
Suchman (1967) called beliefs about cause- belief in the square wheel and then assist
effect relationships the program's validity the believers in designing an evaluation that
assumptions. For example, many education will permit them to test for themselves their
programs are built on the validity assump- own perceptions and hypotheses.
tions that (1) new information leads to This does not mean that the evaluator is
attitude change and (2) attitude change passive. In the active-reactive-adaptive
affects behavior. These assumptions are process of negotiating the evaluation's fo-
testable. Does new knowledge change atti- cus and design, the evaluation facilitator
tudes? Do changed attitudes lead to can suggest alternative assumptions and
changed behaviors? theories to test, but first priority goes to
As validity assumptions are articulated evaluation of validity assumptions held by
in a means-ends hierarchy, the evaluator primary intended users.
can work with intended users to focus the
evaluation on those critical linkages where Filling in the Conceptual Gaps
information is most needed at that particu-
lar point in the life of the program. It is Helping stakeholders identify concep-
seldom possible or useful to test all the tual gaps in their theory of action is another
validity assumptions or evaluate all the task for the user-focused evaluation facili-
means-ends linkages in a program's theory tator. The difference between identifying
of action. The question is one of how freely validity assumptions and filling in concep-
such validity assumptions are made and tual gaps can be illustrated as follows.
how much is at stake in testing the validity Rutman (1977) has argued that the idea of
of critical assumptions (Suchman 1967:43). using prison guards as counselors to in-
In a utilization-focused evaluation, the mates ought never have been evaluated
evaluator works with the primary intended (Ward, Kassebaum, and Wilner 1971) be-
users to identify the critical validity as- cause, on the face of it, the idea is nonsense.
sumptions where reduction of uncertainty Why would anyone ever believe that such
about causal linkages could make the most a program could work? But clearly,
difference. whether they should have or not, many
The evaluator's beliefs about the validity people did believe that the program would
of assumptions is less important than what work. The evaluator's task is to fill in the
staff and decision makers believe. An evalu- conceptual gaps in this theory of action so
ator can have greater impact by helping that critical evaluative information needs

can be identified. For example, are there delineate crucial intervening objectives.
initial selection processes and training pro- T h e ultimate goal was cleaner air; the
grams for guards? Are guards supposed to target of the legislation was a handful of
be changed during such training? T h e first engines that each auto manufacturer
critical evaluation issue may be whether tested before going to mass production.
prison guards can be trained to exhibit Authorization for mass production was
desired counselor attitudes and behaviors. given if these prototypes operated u n d e r
Whether prison guards can learn and prac- carefully controlled conditions for 50,000
tice h u m a n relations skills can be evaluated miles. Cars that failed pollution tests as
without ever implementing a full-blown they left the assembly line were not with-
program. held from dealers. Cars on the r o a d were
Filling in the gaps in the program's the- not inspected to make sure that pollution
ory of action goes to the heart of the imple- control equipment was still in place and
mentation question. W h a t series of activi- functioning properly. Prototypes were
ties must take place before there is reason tested for 5 0 , 0 0 0 miles, but most cars are
even to hope that impact will result? If eventually used for 100,000 miles, with
activities and objectives lower in the pollution in older cars being much worse
means-ends hierarchy will not or cannot be than that in new ones. In short, there are
implemented, then evaluation of ultimate many intervening steps between testing
outcomes is problematic. prototype automobiles for pollution con-
trol compliance and improving air qual-
ity. As Bruce Ackerman (1977) predicted,
There are only two ways one can move up
the scale of objectives in an evaluation:
(a) by proving the intervening assumptions Over a period of time, the manufacturers will
through research, that is, changing an as- build cleaner and cleaner prototypes. Bil-
sumption to a fact, or (b) by assuming their lions of dollars will be spent on the assembly
validity without full research proof. When line to build devices that look like these
the former is possible, we can then interpret prototypes. But until Congress, the EPA, and
our success in meeting a lower-level objective the states require regular inspections of all
as automatic progress toward a higher cars on the road, very little will come of all
one. . . . this glittering machinery.
When an assumption cannot be Indeed, we could save billions if we con-
proved . . . we go forward at our peril. To a tented ourselves with dirtier prototypes, but
great extent, the ultimate worth of evalu- insisted on cleaner cars. . . . Congressmen
ation for public service programs will de- themselves woefully exaggerate the impor-
pend upon research proof of the validity of tance of their votes for cleaner prototypes.
assumptions involved in the establishment of They simply have no idea of the distance
key objectives. (Suchman 1967:57) between prototype and reality. They some-
how imagine that the hard job is technologi-
T h e N a t i o n a l Clean Air Act and its cal innovation and that the easy job is human
a m e n d m e n t s in the mid-1970s provide a implementation, (p. 4)
good example of legislation in which pol-
icy and planning activity focused on initial Delineating an espoused theory of ac-
objectives and ultimate goals but failed to tion involves identifying critical assump-
The Program's Theory of Action • 227

tions, conceptual gaps, and information Targets of opportunity are those evalu-
gaps. The conceptual gaps are filled by ation questions about which primary in-
logic, discussion, and policy analysis. The formation users care the most and most
information gaps are filled by evaluation need evaluative information for decision
research. making. Having information about and
answers to those select questions can
make a difference in what is done in the
program. An example from an evaluation
Using the Theory of Action of the New School of Behavioral Studies
to Focus the Evaluation: in Education, University of North Dakota,
The New School Case illustrates this.
The New School of Behavioral Studies
Once an espoused theory of action is in Education was established as a result of
delineated, the issue of evaluation focus a statewide study of education conducted
remains. This involves more than mechani- between 1965 and 1967. The New School
cally evaluating lower-order validity as- was to provide leadership in educational
sumptions and then moving up the hierar- innovations with an emphasis on individu-
chy. Not all linkages in the hierarchy are alized instruction, better teacher-pupil re-
amenable to testing; different validity as- lationships, an interdisciplinary approach,
sumptions require different resources for and better use of a wide range of learning
evaluation; data-gathering strategies vary resources (Statewide Study 1967:11-15).
for different objectives. In a summative In 1970, the New School had gained na-
evaluation, the focus will be on outcomes tional recognition when Charles Silberman
attainment and causal attribution. For for- described the North Dakota Experiment as
mative evaluation, the most important fac- a program that was resolving the "crisis in
tor is determining what information would the classroom" in favor of open education.
be most useful at a particular point in time. The New School established a master's
This means selecting what Murphy (1976) degree, teaching-intern program in which
calls targets of opportunity in which addi- interns replaced teachers without degrees
tional information could make a difference so that the latter could return to the univer-
to the direction of incremental, problem- sity to complete their baccalaureates. The
oriented, program decision making: cooperating school districts released those
teachers without degrees who volunteered
to return to college and accepted the mas-
In selecting problems for analysis, targets of ter's degree interns in their place. Over
opportunity need to be identified, with po- four years, the New School placed 293
litical considerations specifically built into interns in 48 school districts and 75 ele-
final choices. Planning activity in a certain mentary schools, both public and paro-
area might be opportune because of expiring chial. The school districts that cooperated
legislation, a hot political issue, a breakdown with the New School in the intern program
in standard operation procedures, or new contained nearly one third of the state's
research findings. At any time, certain poli- elementary school children.
cies are more susceptible to change than The Dean of the New School formed a
others, (p. 98) task force of teachers, professors, students,

parents, and administrators to evaluate the wanted and needed. Indeed, for a variety
program. In working with that task force, of personal, political, and scholarly rea-
I constructed the theory of action shown in sons, these issues made quite good evalu-
Exhibit 10.3. The objectives stated in the ation targets of opportunity. The evalu-
first column are a far cry from being clear, ation therefore focused on three questions:
specific, and measurable, but they were (1) To what extent are summer trainees
quite adequate for discussions aimed at conducting open classrooms during the
focusing the evaluation question. The sec- regular year? (2) What factors are related
ond column lists validity assumptions un- to variations in openness? (3) What is the
derlying each linkage in the theory of ac- relationship between variations in class-
tion. The third column shows the measures room openness and parent/administrator
that could be used to evaluate objectives at reactions to intern classrooms?
any level in the hierarchy. Ultimate objec- At the onset, nothing precluded evalu-
tives are not inherently more difficult to ation at any of the seven levels in the hier-
operationalize. Operationalization and mea- archy of objectives. There was serious dis-
surement are separate issues to be deter- cussion of all levels and alternative foci. In
mined after the focus of the evaluation has terms of the educational literature, the is-
been decided. sue of the outcomes of open education
When the Evaluation Task Force dis- could be considered most important; in
cussed Exhibit 10.3, members decided they terms of university operations, the summer
already had sufficient contact with the sum- program would have been the appropriate
mer program to assess the degree to which focus; but in terms of the information
immediate objectives were being met. They needs of the primary decision makers and
also felt they had sufficient experience to primary intended users on the task force,
be comfortable with the validity assump- evaluation of the intermediate objectives
tion linking objectives six and seven. With had the highest potential for generating
regard to the ultimate objectives, the task useful, formative information.
force members said that they needed no In order to obtain the resources neces-
further data at that time in order to docu- sary to conduct this evaluation, Vito Per-
ment the outcomes of open education (ob- rone, dean of the New School, had to make
jectives one and two), nor could they do unusual demands on the U.S. Office of
much with information about the growth Education (OE). The outcomes of the New
of the open education movement (objective School teaching program were supposed to
three). However, a number of critical un- be evaluated as part of a national OE study.
certainties surfaced at the level of interme- Perrone argued that the national study, as
diate objectives. Once students left the designed, would be useless to the New
summer program for the one-year intern- School. He talked the OE people into al-
ships, program staff were unable to care- lowing him to spend the New School's
fully and regularly monitor intern class- portion of the national evaluation money
rooms. They didn't know what variations on a study designed and conducted locally.
existed in the openness of the classrooms, The subsequent evaluation was entirely the
nor did they have reliable information creation of the local task force described
about how local parents and administrators above, and it produced instruments and
were reacting to intern classrooms. These data that became an integral part of the
were issues about which information was North Dakota program (see Pederson 1977).
The Program's Theory of Action • 229

The national study produced large volumes the evaluation. This means moving beyond
of numbers (with blanks entered on the discussing the theory of action to gathering
lines for North Dakota) and, as far as I can data on it. Such was the case in an evalu-
tell, was of no particular use to anyone. ation of a multifaceted home nursing pro-
gram for the elderly. Facilitating articula-
tion of the program's theory of action
Developing a Theory helped staff sort out which of the many
of Action as Process Use things they did were really central to the
outcomes they wanted. As a member of an
Thus far, this discussion of theory of evaluation task force for farming systems
action has been aimed at demonstrating the research, I worked with colleagues to iden-
value of this conceptual strategy as a way tify the critical elements of "a farming sys-
of focusing evaluation questions and iden- tems approach" and place those elements
tifying the information needs of primary in a hierarchy that constituted a develop-
stakeholders. At times, helping program mental theory of action. In these and many
staff or decision makers to articulate their other cases, my primary contributions were
programmatic theory of action is an end in pro