You are on page 1of 12

Evaluation

Copyright @ 1996
SAGE Publications
London, Thousand Oaks
and New Delhi.
Vol 2(4): 393-404

The Theory behind Practical Evaluation


MICHAEL SCRIVEN
Evaluation & Development Group
Two claims are put forward in this article : (I)practical evaluation, especially
practical program evaluation, depends on theoretical assumptions that put
its procedures and conclusions at risk; and (2) practical evaluation is
unnecessarily and seriously limited because of its failure to use theory as a
way to recognize and develop solutions to problems not yet solved in
practice. Both claims raise the question of the relation of theory to practice
in our discipline, and most of the article concerns this question. The focus of
the discussion is the recent Handbook of Practical Program Evaluation. At the
end, one possible low-level theory about evaluation-essentially, a
conception of evaluation as a discipline-is proposed.

Overview
Mathematicians operate with little concern for the issues in the subject known as the
’foundations of mathematics’ or ’metamathematics’ because it appears-although this
is disputed-only small parts of mathematics would be significantly affected by dif-
ferent resolutions of the foundational disputes. On the other hand, curriculum and
materials developers in school mathematics are highly dependent on theoretical assump-
tions about learning in general, and psychological development in the child. It is argued
here that the second situation is nearer the truth for program evaluation-and for any
other branch of evaluation. The second kind of model is extremely common, not
exceptional: other examples of practical professions with the same high dependence on
theory are psychotherapy, computer design, business economics, and management.
To provide a reality check for this discussion, frequent reference will be made to the
collection of 25 essays in A Handbook of Practical Program Evaluation (Wholey et al.,
1994), referred to hereafter as ’the Handbook’. It should be understood, however, that
this is in no way a review of the book, which deserves detailed analysis. In particular, I
base these remarks on general features of the book, and do not assess chapters, because
the latter would require much more detail.
The book has four sections: Evaluation Design (Quantitative and Qualitative
Approaches); Practical Data Collection Procedures; Practical Data Analysis; and
Planning and Managing Evaluation for Maximum Effectiveness. One or more of the
authors writes or co-authors five of the 25 chapters. As you will notice from the section
titles, a great deal of the content overlaps with social-science methodology. At this

393

Downloaded from evi.sagepub.com at PENNSYLVANIA STATE UNIV on May 26, 2015


stage of the evaluation game, it is probably correct to think that this is the most useful
emphasis, because so many evaluators are either short on that background or not sure
how to apply it to evaluation investigations. However, the absence of a section or
chapter that could be called ’evaluation-specific methodology’ underlies and sympto-
matizes many of the problems mentioned here.
This book is important because it represents much of the thinking amongst up-
market program evaluators in the US field, particularly those who make their living
doing or supervising program evaluations. They are mainly government employees or
contractors-or people who are teaching those who have those roles. The contributors
represent a high-quality sample from that group, which ensures that the book will be
influential, as indeed it deserves to be, given its very useful contents. It is particularly
worth noting that these include an overview of the present situation in and problems
with the use of evaluation, especially by the US government, and some very good
discussion of evaluation reporting.
Apart from the high quality of its contributors, the Handbook is also useful for our
purposes because it contains no indexed reference to evaluation theory.’ This is
consistent with its title, but it raises the issues addressed by this article, and the
argument here is that one can no longer-and should no longer-design, do, monitor,
or teach the skills of practical program evaluation without clarifying the underlying
theoretical assumptions. This claim has nothing to do with the sermons we have all
heard from the constructivists about how everything we do, in evaluation or outside it,
depends on our individual constructions of reality, including our epistemological
theories. The discussion in this short article ignores that kind of all-pervasive
philosophical theory, since it afflicts us all, research mathematician and mathematics
curriculum developer alike. Hence it has no impact on the present issue of whether
evaluation practitioners are more like one of these models than the other, insofar as
their practice requires or at least benefits from some understanding of evaluation
theory-or some defense by someone who does understand evaluation2 theory-in
order to establish its validity across the full range of program evaluation.~
The bottom line of the critique here involves two kinds of conclusion. The first is that
we should put some restrictions on the confidence we place in some of the Handbook’s s

recommendations, because they rest on weak foundations. These suggestions are


referred to here as errors of commission. Second, this article is intended to indicate
ways in which readers can further strengthen their practical program evaluation skills
by attention to some issues in evaluation theory-issues that are not mentioned in the
Handbook. These are to be called errors of omission. Many of the errors of omission
stem from the lack of an overview of the nature of evaluation, one of the most
important products of even a minimal theory of evaluation. At this conceptual level, a
theory marks off boundaries to demarcate evaluation from other types of investigation:
for example, descriptive, diagnostic, causal or correlational research in the social
sciences. And it may identify connections as well as differences.
At a slightly more detailed level, a theory generates a taxonomy of applications-and
setting this up leads one to notice empty lines or cells. The Handbook suffers, as we see,
from a lack of this systematic kind of approach based on the concepts of evaluation
rather than those of all practical social-science investigations. One perspective on this

394

Downloaded from evi.sagepub.com at PENNSYLVANIA STATE UNIV on May 26, 2015


situation is to say that evaluation is a very young discipline-although it is a very old
practice-and that in the early years of developing a discipline that refers to an ancient
practice, the gaps between theory and practice are large. Of course, this means that the
payoffs from theory to practice, when we fill the gaps, are sometimes very large.
Psychology, probability theory, game theory and computer science are other examples
of this phenomenon from our era. For logic and astronomy, on the other hand, the payoff
from theory for practice has been going on for more than two millennia.

Sins of Omission and Sins of Commission


We can begin by setting aside much of the substantial and valuable discussions of what
might be called practical methodology, which are quite extensive in the Handbook.
These include discussions of data access, the use of true and quasi-experimental designs,
focus groups, survey design, the use of expert judgment, etc. Most of this methodology is
imported from the social sciences. Of course, some of these methodologies are debata-
ble, at more or less their own level-but they are not highly dependent on a theory of
evaluation per se. There is another part of practical methodology, however, which deals
with extremely important but more general issues, and that part, it is suggested here, does
rest heavily for its validation on very basic theoretical issues in the logic of evaluation.
Some examples of possible errors of omission-whose absence also leads to some
errors of commission-all of great importance in practical evaluation, include: the
difficult problem of how to infer legitimately from evaluative conclusions about a
program to recommendations about its remediation or disposition (missed as a problem
despite the extensive treatment of recommendations here, which simply assumes they
follow necessarily); the problem of what, if anything, can be inferred about the quality
of an evaluation from a failure to implement it; how to do needs assessments, and the
role of needs assessments done by the evaluator when the government has defined the
objectives of a program (are they futile or still crucial?); the identification of
disqualifying bias in an evaluator (e.g. does the degree of involvement in the field
required for expertise almost always create bias?), or of an inappropriate attitude
towards evaluation in an expert subject-matter consultant (e.g. do they think that mere
subject-matter expertise is sufficient for good evaluation?).
Another problem is that the failure to discuss the connection of theory with practice _
leaves the reader poorly armed for the kind of criticism of evaluation findings that
attacks their foundations, for example, (1) the generic attack on the use of experts, (2)
attacks on the unscientific nature of any evaluative conclusion, or (3) the absence of
concern with ’traditional values’ in, for example, educational evaluation. That kind of
fundamental criticism comes up with considerable frequency in the corridors of power
in Washington and other capitals, not to mention the corridors of ivory towers. (Note
that the Handbook recommends that evaluators should be able to ’sell’ both evaluation
and particular evaluation results.)
However, one of the most serious consequences of the failure to set 600 pages of
practical advice into some kind of theoretical framework-or at least to indicate that
each practice needs to be set in some theoretical framework-is the resultant
narrowness and lack of perspective, the main sin of omission. Further examples:

395

Downloaded from evi.sagepub.com at PENNSYLVANIA STATE UNIV on May 26, 2015


1. One cannot, or should not, do much program evaluation without getting into

personnel evaluation (a program is, after all, just the work of its personnel),
especially if you want to finish up making formative recommendations rather
than black-box summative judgments-but the term or topic of personnel
evaluation never shows up.
2. You cannot do evaluation across the range of federal or state programs without
being able to do product evaluation, because many of those programs are either
directly or incidentally concerned with the production of educational or other
products. However, there is no mention here of how to do product evaluation, or
where to find out how to do it. Is the assumption that any competent program
evaluator can do product evaluation? Interesting assumption-but one that needs
to be given some support, support which will of course require a little
understanding of product evaluation. (Given that the leading institutions for
doing consumer-product evaluation-the consumer associations-are methodo-
logically very confused, this is not a promising line [see Scriven, 1994].) Is the
assumption that one just needs to survey some consumers about the quality of the
product? That view is exactly as superficial as the view that program evaluation
only requires a survey of program recipients.
3. What about proposal evaluation, which is in a sense the evaluation of program
scenarios, i.e. programs, before they start? There is a good discussion of
evaluability here, but it does not address this question.
4. Nor is there any discussion of the evaluation of RFPs-requests for proposals.
(The same discussion of evaluability, under which some aspects of RFP
evaluation could be included, skips over this issue, which needs to be addressed
directly, since almost all proposal evaluation is done using an invalid model-
allocating 100 points across a set of criteria of merit.)
These last two areas (3) and (4) are ones where the evaluator can have a huge influence
on the avoidance of time- and money-wasting activities, and may even be able to get
some requirements in about baseline data-gathering, often essential to getting a

worthwhile evaluation. But there are other reasons why proposal and RFP evaluation
should come into a handbook for program evaluation. For example, many large
programs subcontract work, and the way they do it should be examined by the program
evaluator. Just as one must look to see if inequity is current in the appointment or
promotion of staff, as part of judging the management of a program, so one must look
to see if invalidity is present in the selection of subcontractors.
Apart from cutting these other fields out of program evaluation, a decision that is at
least damaging and sometimes almost fatal, the lack of any overview of evaluation has
another bad effect. It narrows the field of view of the program evaluator in a way that
forces her or him to reinvent many wheels that were long ago constructed and used in
other fields of evaluation. For example, personnel evaluation long ago worked out valid
ways to handle a method for scoring candidates which transfers directly to the
integration of subdimensions of a program evaluation, or to the evaluation of proposals.
It avoids one fatal flaw in the standard federal method for doing the latter-namely the
failure to require minimum scores on some or (usually) all of those criteria.

396

Downloaded from evi.sagepub.com at PENNSYLVANIA STATE UNIV on May 26, 2015


Ironically, the failure to ’make the theory connection’ directly undercuts the practical
value of the advice here (in the preceding points, we’ve been emphasizing the indirect
damage). Here are three examples:
1. Time and again, a number of items or factors are referred to as features to look
for, for example, in checking the quality of evaluations. But they are rarely
identified as checklists of a particular type (e.g. desiderata vs necessitata;
procedural vs evaluative; iterative vs one-shot), and never subjected to the highly
desirable tidying-up that is an essential by-product of any serious look at the logic
of checklists (e.g. looking at completeness, non-overlap, equal level of generality,
strict control of hierarchical structure) (for more details, see Scriven, forth-
coming). This failure significantly weakens much of the practical advice.
2. Although there is a good deal of reference to cost analysis, including an
excellent-indeed, essential-essay by James Edwin Kee, there is only passing
reference to psychological and other non-monetary costs. These costs-of
programs or evaluations-include, for example, program-induced stress and the
clinical entity known as ’evaluation anxiety’. Recognizing these costs is a
virtually automatic consequence of a serious theoretical underpinning of the
concept of cost, and they are sufficiently practical to spell the success or failure of
some programs and evaluations, all by themselves.
3. Everyone doing practical evaluation knows that before one starts writing the
Conclusions section, one has data and judgments on a number of highly
independent dimensions of program quality.
How does one pull these together? Presumably by using some kind of weighting.
Where do the weights come from? Is the use of numerical weights always appropriate,
sometimes appropriate, or never appropriate? It was mentioned that looking at
personnel evaluation would give some solutions, but here we are saying that the real
need is to develop the general theory of evaluation further, until it can be squeezed for
practical solutions-including a solution to this problem. When we do this, we find that
even the model in personnel evaluation is not fully general, and a more general model
is readily feasible, that will apply to all fields of practical evaluation. The reader gets no
help with this.
Would meeting all these suggestions for further coverage add up to many more
pages, when the book is already very long and expensive (US$60)? Not at all; it would
not have been hard to drop one of the five chapters by the editors in favor of an
overview chapter that goes into some of these connections and considerations, write it
first, and send it out along with the instructions to authors, accompanied by the request
that they keep it in mind as appropriate. The result would have considerably
strengthened the book against the criticisms listed above and expanded below. (The
instructions to authors that did go out were quite directive, and admirable in their
emphasis on practical solutions even when these were not scientifically perfect.)
Like all emerging disciplines, evaluation has to fight certain generic battles, some
internal and some external. The great internal battles are between those who value the
practice so much more than the theory that they weaken the practice by ignoring the
basic theory; and those who go in the opposite direction, as psychology did in its early

397

Downloaded from evi.sagepub.com at PENNSYLVANIA STATE UNIV on May 26, 2015


days-and logic in its middle age-and spend so much time on theoretical squabbles
that no benefits to practice result. The great external battles for evaluation have been
against those who wish to dismiss it as entirely lacking in objectivity, and against those
who wish to attain objectivity for it by making it a mere matter of information-
gathering (and usually, therefore, merely part of another pre-existing discipline, such as
management science). The Handbook makes neither of the latter two mistakes. Its
editors introduce the volume with a definition of evaluation that is highly evaluative,
and the volume stands as its own argument for the possibility of objectivity in dealing
with many evaluative issues. The Handbook’s problem is in its lack of concern with the
minimal theoretical underpinning required to avoid problems in practice.
From the foregoing paragraphs, a slight enlargement of the ’sins of omission vs sins
of commission’ grouping emerges. The following five types of errors were identified as
occurring when setting out advice for practical program evaluation, while under-
estimating the importance of theory for practice (they are not sharply distinct):
1. Shortcomings in the area of general practical methodology, for example,
omission of needs assessment, errors about the inference from evaluative
conclusions to recommendations.
2. Lack of defense against attacks on the foundations of evaluation, for example,
from those who think evaluative conclusions cannot be inferred from scientific
premises (hence evaluation is unscientific), or that evaluations cannot deal with
ethical issues (and are hence unable to generate solutions to ethical problems). In
the practical world, secure theoretical foundations are often the best defense.
3. Narrowness in perception of the field, especially the failure to cover fields
conventionally seen as other than program evaluation but that are a de facto part
of much program evaluation, for example, product and proposal evaluation.
4. Failure to benefit from general methodological solutions discovered in other
fields of evaluation, for example, personnel evaluation.
5. Lack of generality in the practical techniques because of failure to critique the
concepts of the general theory of which they are special applications, for
example, failing to make any systematic extension of cost analysis to non-
monetary costs.

In the next four sections, this list is significantly extended, and some reference is made
to other examples of the above. In the final section, we go on to some positive
recommendations.

Ethics, Law, and Goals


Much to the credit of the authors, and editors, there are quite a few specific references
to ethics here-by contrast with the best-selling textbook in evaluation, Rossi and
Freeman-for example, the ethics of assignment to control groups in randomized
design. The problem is that these ethical issues are restricted to those that arise over
issues of experimental design. The missing conception is ethics as a dimension for the
assessment of a program. To develop this dimension requires a conceptual linkage that
takes a theory of evaluation, not a handbook of procedures; but in its absence, there is a

398

Downloaded from evi.sagepub.com at PENNSYLVANIA STATE UNIV on May 26, 2015


great gap in the approach to evaluating the goals of programs, a gap one often
encounters in the field
(how does one evaluate abortion clinics and drug programs
without reference to ethics?). One can infer from the fact that neither goals nor
objectives gets referenced in the index that there is not much emphasis on the
evaluation of goals; but there is a strong argument that to omit this is unethical, since
goals (as converted into practice) may be unethical.
Even if not unethical, it is surely important to look at other aspects of goals, such as
legality, feasibility and inconsistency. I take the absence of any effort in this direction
(at least as far as the index and considerable searching effort could determine) to be one
shortcoming in itself. But there is also the problem of unethical practices in a program
with highly ethical objectives. For example, a federal program should surely be
checked, at some level of evaluation, for violations of justice, for example, in the equal
employment area. (This is true in both formative and summative evaluation; and goes
beyond any issues of legality.)
The missing link, conceptually, is to see that just as a program is embedded in
current policy, and current policy in general policy (democratic principles of govern-
ment), so that in turn is embedded in the most general-the overriding-social policy,
namely ethics. Hence program evaluation, policy evaluation, normative political
philosophy and ethical evaluation are part of one hierarchy, which the evaluator must
understand in order to accord the proper respect and weight to considerations from each
level.

Measurement
Again, this gets no direct reference in the index or the table of contents, and the
mention of performance indicators is only in the context of evaluability assessment, not
program evaluation. (There is some discussion of performance measures in the chapter
on outcome monitoring.) One’s general impression is that practical evaluation is

currently rather heavily involved in such matters, so the absence of referenced


discussion is likely to prove disappointing to many readers. There are other major and
minor practical measurement matters on which one might hope to get some guidance
from such a handbook; for example, on the question of the logical legitimacy of using
indicators that are only statistically connected with success in the evaluation of a
particular program (major problem); and on the question of when to use norm-
referenced vs criterion-referenced survey and test instruments.

Synthesis
Here this is taken-for example, but not only, in a valuable chapter on ’Synthesizing
Research Findings’-to mean meta-analysis, i.e. the integration of research studies,
rather than the integration of subevaluations, or of performance data, on multiple
dimensions. We can call the former ’external synthesis’-the items being integrated
are standalone studies with a common topic-and the latter ’internal synthesis’-the
items being integrated are the criteria of merit within a single evaluative investigation.
But the latter is a key element in practical evaluation, and in that very chapter there is

399

Downloaded from evi.sagepub.com at PENNSYLVANIA STATE UNIV on May 26, 2015


some discussion of how to do it in one particular case-evaluating the studies to be
included in a research synthesis. One needs more than ad hoc discussions of such cases
in order to develop an approach to internal synthesis, just as one needed more than ad
hoc discussion to develop guidelines for practical external synthesis.
There is a significant literature on internal synthesis in the social sciences under the
heading of multiple-attribute utility technology (MAUT)-for example, four mono-
graphs in the Sage series on quantitative methodology-that addresses this issue, albeit
rather shakily. But the MAUT theorists do provide very detailed discussions of internal
synthesis in many typical evaluation examples, and that’s something a practitioner
needs to know about-and know the problems with that particular family of
solutions.

Other Conceptual Confusions


The last thing readers of a handbook on practical evaluation want is a boring list of
definitions. However, before producing such a volume, someone needs to take a good
look at the relevant definitions, and possibly share them with potential contributors.
The chapter on process evaluation, for example, opens with a contrast between it and
impact evaluation: ’process evaluation verifies what the program is, and whether or not
it is delivered as intended to the targeted recipients, and in the intended &dquo;dosage&dquo; ’ (p.
40). Well, that excludes two groups of pretty good process evaluations, as we normally
call them:

1. those that do evaluation-rather than description-of the delivery system, for


example, from a legal or ethical or scientific-content basis; and
2. those that look at who the recipients are, not just at whether the targeted ones
were reached.

This omission is consistent with the editors’ own definition of program evaluation (p.
3), which makes clear that they are excluding the evaluation of process as part of the
evaluation of programs. This seems so much at variance with present use-for
example, by the General Accounting Office, or auditors in general-as to suggest a
need for some explanation of the reasons for going this way. There are other problems
of this kind, for example, the definition of qualitative evaluation would be roundly
rejected by quantitative researchers, since it is highly prejudicial (p. 70).
In general, this is not an ideal way to define concepts. The overall point is simple
enough: conceptual clarity (or its lack) shows up in definitions, and comes from careful
thought at the conceptual level, where the foundations for theories are laid. Absent a
serious conceptual level, which undertakes to explore and explain the connections to
the practical level, practice becomes confused and confusing. Some other omissions
have already been mentioned. A serious one for a practical handbook is the matter of
generalizability (’external validity’, though this is a confusing term). A program is
often valuable because it shows the way, it can be exported; it’s an interesting question
how one shows this, and what dimensions of generalizability one should explore. This
connects with the question of the ’significance’ of programs, which goes beyond their

400

Downloaded from evi.sagepub.com at PENNSYLVANIA STATE UNIV on May 26, 2015


comparative cost-effectiveness (cf. the significance of a black South African winning
the marathon at the 1996 Olympic Games).

Conceptualizing the Discipline of Evaluation


Which brings us to the bottom line. I have been criticizing the Handbook for poor
theoretical infrastructure, but there should be no implication of its overall lack of utility
or of its substandard nature. Its contents are, in my experience, representative of good-
to-excellent work by current authors presenting papers on aspects of evaluation in the
journals and at the professional association meetings of any social-science discipline,
and in the journals and meetings of the evaluation associations. I am arguing for a
major change in this situation, but not in the sense of a simple upgrade of quality.
The needed change is in a global quality, in connectedness-in synergistic payoffs.
The time has come to realize that we now have a well-established discipline of
evaluation, just as we do of measurement or of statistics or experimental design. This
has nothing to do with the question whether we have a single ’theory of evaluation’
about all aspects of evaluation; there are plenty of Bayesians around and plenty of
Fisherians, but no-one takes that to show there’s no discipline of statistics or that one
can define standard deviation however one likes. There will always be differences at
the ideological or foundational level in nearly every discipline-mathematics and
physics are no exception. What a discipline requires is:
1. definable territory of its own;
a
2. abasic conceptual framework-a low-level theory-on which there is a
reasonable degree of agreement; and
3. extensive useful practice (in the field, or in working for the clarification or

integration of other disciplines).


Evaluation meets these requirements-and there follows one conception of it which
will serve to demonstrate that until a better one comes along.
(a) Evaluation is the study of the merit, worth, or significance of various entities,
these being three different although connected issues. (In some contexts, merit is close
to quality or effectiveness, and worth to value, cost-effectiveness or efficiency; in
some, evaluation is the same as assessment, in others these are slightly different; in
some, evaluation is the same as appraisal or review.)
(b) The discipline of evaluation includes optimal evaluation practice but also studies
its concepts and the procedures for their investigation.
(c) Evaluation has, in partial analogy with measurement, four basic predicates:
grading, ranking, scoring and apportioning. In general, a different experimental design
is required in order to reach a conclusion with any given one of these predicates in it.
(Confusing other predicates with these, in particular by thinking that a statement of
liking or preference is an evaluative claim, provided much of the basis of the
rationalization for the value-free doctrine.) Note also that these predicates involve no
reference to: causal, explanatory, theoretical, recommendatory or diagnostic concepts
or processes; most descriptive efforts; or the hypothesis-testing approach to research.
Connections to these other matters are incidental or a matter of speculative method-

401

Downloaded from evi.sagepub.com at PENNSYLVANIA STATE UNIV on May 26, 2015


ology, favored by some evaluators, but not a definitional part of core evaluation
methodology.
(d) Although evaluation is often the basis on which decisions are or should be made,
it is not logically wedded to decision-making: evaluation done in historical retrospect,
or for research or training purposes, is just as certainly evaluation, and can just as well
be good evaluation, even when it has no link to decisions (other than the usual
(potential) link of all knowledge to decisions.
(e) Relatedly, evaluation may be done without including any recommendations: they
can be justified, if at all, only by bringing in further data or assumptions about context
and feasibility (and other matters, depending on the case).
(f) Apart from its definitional and conceptual territory, outlined earlier, evaluation
has a set of subject-specific tasks or problems-and associated skills-that belong to it
and it alone, each of them implicitly solved in evaluation practice (with degrees of
validity that vary between applied fields) but now quite well treated in explicit
methodological discussions. These include the problems of: validating evaluative
premises (e.g. by doing needs assessments or functional analysis); synthesizing sets of
value claims that relate to one entity being evaluated (e.g. by various ’weight and sum’
algorithms); controlling evaluative bias (e.g. by ’blinding’); and the validity of using
’empirically validated’ indicators. (These evaluation-specific issues do not include such
matters as whether to use control-group designs, or quantitative rather than qualitative
methodology, although these are often crucial matters in a specific evaluation, as in
specific causal studies anywhere in the social or biological and medical sciences.)
(g) There are half a dozen major applied fields of evaluation, defined by the type of
entity under consideration, in which systematic practice has gone on for longer than
science has existed, each with their own techniques and vocabulary-performance
evaluation (testing), product evaluation, personnel evaluation, etc. (Just as gambling
went on as a systematic practice for millennia before probability theory was invented to
assist it and conceptualize the notions it involves.)
(h) There are a dozen major applied fields of more recent (mainly 20th century)
vintage, including policy studies, proposal evaluation, institutional evaluation, indus-
trial quality control, gemstone grading, technology assessment, the evaluation of
research, etc. These have major overlaps with older fields, but each has its own
evaluative vocabulary, practical methodology and techniques (and journals, institutes,
etc.).
(i) Three major applied fields of evaluation have become visible or significant even
more recently (mainly in the last two decades). These are: (i) intradisciplinary

evaluation, (ii) meta-analysis, and (iii) meta-evaluation.

(i) Intradisciplinary evaluation, without which no discipline can exist, is the


evaluation of the tools of a discipline-of the quality or value or significance of
data, hypotheses, experimental designs, previous work, conclusions, instru-
ments, etc. Its importance includes the fact that it alone makes the doctrine of
value-free science self-contradictory (since ’science’ is an evaluative term,
contrasted with pseudo-science on the basis of intradisciplinary evaluations).
Evaluation is not only logically necessary for all disciplines, but is, along with

402

Downloaded from evi.sagepub.com at PENNSYLVANIA STATE UNIV on May 26, 2015


logic, one of the only two disciplined processes that all the disciplines share.
Along with other disciplines that provide major tools for at least some
disciplines (and have their own status as an autonomous discipline) it is a
member of the class of ’transdisciplines’ which also includes statistics,
experimental design and measurement.
(ii) Meta-analysis is the technique of research synthesis, now recognized as a
complex and valuable tool in the social-science repertoire, and one that
involves evaluation as an essential element (in deciding which studies to
include).
(iii) Meta-evaluation is the evaluation of evaluations, and, like meta-analysis or
statistical inference, there are a number of ways to do it, each with their own
advantages, disadvantages and appropriate contexts. Perhaps the most impor-
tant feature of meta-evaluation is that it reflects the fact that evaluation is a self-
referent subject; this requires special methodological procedures. A second
feature of meta-evaluation is that utilization is not an intrinsic criterion of merit
for an evaluation. (The two meta-subjects overlap when a set of evaluative
studies are to be synthesized.)

(j ) In all of these applied fields, the basic logic of evaluative reasoning, from
empirical and definitional or analytic data to evaluative conclusions, is the same. It is
not deduction and it is not statistical or quantitative probabilistic inference; although it
uses all of these at times, it is more general. It has a long history of validation in the law
as probative or prima facie inference, and in common sense (where it is signalled by

phraseology like ’other things being equal’ and ’on balance’) as well as informal
social-scientific reasoning. Competence in performing it is essential in evaluation, but
not one of the evaluation-specific tasks. Only very recently have developments in
informal logic begun to crystallize the structure of probative inference, so logic as a
discipline, although as general in its relevance as evaluation, lags in its development.
(Confusing probative inference with incomplete deductions or informal statistics was
another support for the rationalization for the value-free doctrine.)
(k) Many issues concerning types and styles of evaluation-formative or summa-
tive, participatory or distanced, goal-free or goal-based, etc.-are part of the practical
methodology, or the philosophy or sociology, of evaluation, or the context of it, or
represent personal preferences in doing it or some part of it or something related to it
(teaching evaluation, consulting about evaluation, gathering data to support an
evaluation, etc.). Disagreements about them, or about evaluation-related but not
evaluation-specific matters such as the use of qualitative or quantitative approaches,
should not be taken to bear on the question of whether one can establish the existence
and nature of a core discipline, as we have tried to do here.

Implications for the Present and Future of Evaluation


Being competent in evaluation should require some understanding of the nature and
concepts of the core discipline, some analytical (not merely practical) work in more
than one of the applied fields, including some competence in performing the
evaluation-specific tasks, concepts, and components. A fortiori, working as an evalu-
403

Downloaded from evi.sagepub.com at PENNSYLVANIA STATE UNIV on May 26, 2015


ation consultant, even though not doing evaluations, should require the above and other
skills as well (e.g. pedagogical skills).
What will be the next great leap forward in the practice of evaluation? It seems likely
that it will occur as some general conception of evaluation, perhaps not this one, is used
to integrate, upgrade and extend the work of the applied fields via clarification and
application of core concepts and methodology. The problem with further attempts to
improve practice without a minimal theory, as we see from the Handbook, is that
complexity and confusion are likely to increase.
Evaluation is not only a discipline on which all others depend, it is one on which all
deliberate activity depends. It follows that significant improvements in the core concept
and techniques of evaluation, of which we have seen many in recent years, and of
which many more could be made within the next few years, have the potential for huge
improvement in the quality of life and work, as well as in the level of achievement in all
disciplines.

Notes
1. In fact, the only reference to theory of any kind is one reference to program theory,
i.e. a theory of the operation of the program being evaluated. Should there be more
references to program theory-is a knowledge of program theory a prerequisite for
good program evaluation? The answer to that question requires a conception of
evaluation, and some understanding of the logic of evaluation. It is not addressed in
the Handbook, but it is a fundamental question about practical program evaluation.
2. It is possible that the fact that the senior author of the Handbook got his doctorate in
philosophy provided him with the intellectual fortitude to dismiss the demands of
the constructivists, as it did for this reviewer. The claim here is, however, that the
good sense of avoiding their fate of drowning in philosophical quicksand has to be
combined with retaining just enough spirit of philosophical inquiry to force us, as
practitioners, to look at our assumptions in a less suicidal way. In this respect, it
would appear that the Harvard brand of philosophy leaves its graduates with less
love for the abstract than its Oxford counterpart. (Perhaps also with less love for
generalizations based on n 2.) =

References
Scriven, Michael (1994) ’Product Evaluation-The State of the Art’, Evaluation Practice 15(1):
45-62.
Scriven, Michael (forthcoming) ’The Logic of Checklists’, Evaluation Practice.
Wholey, Joseph, Harry Hatry and Kathryn Newcomer, eds (1994) A Handbook of Practical
Program Evaluation. San Francisco, CA: Jossey-Bass.

M I C H A E L S C R I V E N came into evaluation from mathematics and philosophy.


He taught at Berkeley forI 2 years, Western Australia for 8, and has held faculty
fellowships at Harvard, the Center for Advanced Study in the Behavioral Sciences,
and the National Science Foundation. Of his 300 publications, about 80 are in the
field of evaluation, in which he now consults and teaches.

404

Downloaded from evi.sagepub.com at PENNSYLVANIA STATE UNIV on May 26, 2015

You might also like