Professional Documents
Culture Documents
0191-491X(94)EO009-5
EVALUATION AS A DISCIPLINE
Michael Scriven
Western Mtchigan University, Kalamazoo, MI, U S A. 1
In~oduction
What we are talking about here is the general discipline of evaluation. It has many
specific and autonomously developed applied areas, of which it's convenient to refer to
half a dozen important ones as 'the Big Six'. These are program evaluation, personnel
evaluation, performance evaluation, product evaluation, proposal evaluation, and policy
evaluation. There are two other applied areas of the greatest importance. One is discussed
only implicitly in this paper: it is meta-evaluation, the evaluation of evaluations. 2 The other
is discipline-specific evaluation, the kind of evaluation that goes on inside a discipline,
sometimes with and often without any assistance from trained evaluators, but always
requiring substantial knowledge of the discipline. It obviously includes the evaluation of
hypotheses, instruments, experimental designs, methods, etc. within a discipline, but here
are some more examples, listed in roughly increasing order of the amount of outside help
they should employ--although they frequently employ less: the evaluation of (i) a new
theory in surface physics (topic-specific); (ii) a review of recent progress and promising
directions in chaos theory (discipline-specific meta-analysis); (iii) a new program in
emergency health care or instruction (application-specific, i.e., it's specific to program
evaluation); (iv) proposals for research support in short-term psychotherapy; (v) several
candidates for a job or for promotion within the department of mathematics; and
(vi) literary criticism, a discipline which is by definition a branch of applied evaluation, but
one with severely limited objectivity and, as usual, in serious need of external review.
Evaluation as a Dtsctpltne 149
There are many other fields of evaluation besides these eight, including: curriculum
evaluation; technology assessment; medical ethics; industrial quality control; appelate court
jurisprudence (the legal assessment of legal opinions); and some from our avocational
interests such as wine tasting, art criticism, movie and restaurant reviewing.
Since the philosophy of science deals, amongst other things, with the question of
the nature and logic of scientific propositions, one of the claims that it has to evaluate is the
issue of whether evaluative claims have (or have not) a legitimate place in science. The
most powerful view about this issue, throughout the twentieth century, has been the
doctrine of value-free science, the denial that evaluative claims have any legitimate place in
science. This position of course entailed that there could not be a science of evaluation.
Similar arguments were raised in the humanities, leading to the general conclusion that
there could be no legitimate discipline of evaluation, whether considered as a science or
under some other heading. We will focus here on the issue with respect to science, since
that is the hardest nut to crack.
Many scientists have an interest in the philosophy of science as well as their own
science--as indeed they should--and they often make claims about the value-free doc-
trine. They commonly make the mistake of thinking that their familiarity with scientific
claims means they are in an expert's position with respect to claims concerning the nature
of scientific propositions. In fact, as is suggested by the radical disagreement between
scientists about such claims, they are in possession of at most half of the requisite
expertise, the other half being an understanding of the concepts involved in
epistemological and logical classification schemes. Their relatively amateur status in this
area, 3 combined with their anxiety about the contamination of science by the intrusion of
matters which many of them saw as essentially undecidable--i.e., value judgments--and
hence essentially improper for inclusion in science, led most of them to embrace and
continue to support the doctrine of value-free science. Once that was in place, and widely
supported by the power elite in science, the stage was set for the suppression of any
nascent efforts at developing a general discipline of evaluation. No-one wanted to be
associated with a politically incorrect movement.
Philosophers of science, who should have known better, were for too long influ-
enced by the distinguished group of neo-positivists in their own discipline, descendants of
the group of logical positivists--scientists and philosophers--who first established the
value-free doctrine. Eventually some of them came to abandon the value-free doctrine, but
just as they became willing to consider this possibility, they were hypnotized by the
constructionist/constructivist revolution. So they jumped ship, but into equally bad
company.4
Constructivism is a currently popular derivative from philosophical scepticism,
relativism, or phenomenology (depending on which version of it one considers). It offered
another kind of reason from the ones considered here for thinking that science was not
value-free. Its reasons were centrally flawed--in particular, they were self-refutingS--but
its extensive acceptance has led to the present unusual situation in which there is
widespread agreement that the value-free doctrine is false based on completely invalid
reasons for supposing it to be false. Since the constructionist's reasons lead to the
abandonment of the notion of factuality or objectivity even for descriptive science their
rejection of the value-free doctrine comes at the price of a simultaneous abandonment of
most of what science stands for. It was in a sense incidental, although important for our
topic here, that constructionism renders impossible the construction of any discipline of
150 M Scnven
evaluation worthy of the name. Ironically, then, the most widely-accepted revolt against
the doctrine of value-free science in fact generated another argument which made a
discipline of evaluation impossible.
The stance here is that a discipline of evaluation is entirely possible and strictly
analogous to the disciplines of statistics, measurement, and logic. That is, evaluation is a
tool discipline, one whose main aim is to develop tools for other disciplines to use, rather
than one whose primary task is the investigation of certain areas or aspects or constituents
of the world. Such disciplines are here called 'transdisciplines' for two reasons. The first
is that they serve many other disciplines--and not just academic ones. Much of the work
that falls under the purview of a transdiscipline is discipline-specific (but not topic-
specific), e.g., biostatistics, statistical mechanics. The second reason for calling them
transdisciplines refers to the "discipline" part of the term. Each of them has a core
component--an academic core--which is concerned with the more general issues of their
organizing theories or classifications, their methodology, nature, concepts, boundaries,
relationships, and logic. In conventional terms, this is often referred to as the pure subject
by contrast with the applied subject. Thus there are pure subjects of logic, of
measurement, and of statistics. The field of evaluation, alone amongst the transdisciplines,
has always had the applied areas--because practical problems demanded it--but never a
core area. Without that, a field cannot be a discipline, for it cannot have a self-concept, a
definition, integrating concepts, plausible accounts of its limits and basic assumptions, etc.
So the birth of the discipline of evaluation was delayed by these squabbles amongst the
families of the potential parents. Meanwhile, the applied disciplines suffered severely, both
from unnecessary limitations and from the use of invalid procedures.
In saying that a general discipline of evaluation has only very recently emerged, it
should not be supposed that there have been no publications which a p p e a r to deal with
such a discipline. There are, for example, many books with the unqualified term
"evaluation" in the title. These would, one might suppose, refer to the general discipline.
Pathetically, however, for six decades they were simply books about student assessment.
That is, they referred to only one part of one applied area in evaluation in one academic
field (performance evaluation in education). More recently, the occurrence of the
unqualified term in the title turns out to be simply referring to program evaluation. In other
cases, a title that referred to "educational evaluation" might lead one to think that the
additional term would entail some inclusion--or at least some mention---of the evaluation
of teaching, administrators, teachers, curriculum, equipment, schools, etc. But while it
used to simply mean 'tests and measurement', more recently, it just means 'program
evaluation in education'. What explains this phenomenon of exaggeration of coverage?
It can be seen as a case of academic nature abhorring a vacuum. In the absence of
any truly general discipline of evaluation, each applied field can think of itself as covering
the general subject. And in a m i c r o c o s m i c way, they do; that is, books on program
evaluation often provide a model of 'evaluation', or at least some remarks about proper
evaluation methodology, which is far more than a mere listing of techniques. But it's far
less than a general model for evaluation, both in breadth and depth. Most of the vacuum
was still there, and its existence was officially endorsed by the value-free doctrine. Low-
level generalizations from the applied fields were no great threat to its legitimacy, although
Evaluation as a Disctphne 151
if you added them all up, the situation was somewhat bizarre--six healthy bastards all said
to have no parents. What was forbidden, as a logical or scientific impropriety--arguments
were given for both claims--was a general account. Nevertheless, it is a bizarre situation
when the whole of science and the teaching of science involves--and cannot continue
without---evaluation, yet the high priests of science still maintain its impropriety.
This was a typical example of the way in which a paradigm can paralyze perception.
One of the classic cases comes from particle physics. Given that electrons have a negative
charge, experimenters supposed that a track of a lightweight particle which curves the
wrong way in a cloud chamber photograph "must have been" due to someone getting
careless with reversing the photographic plate. As we now know, many such photographs
were disregarded--instead of checked, which was easy enough--before someone chal-
lenged the fundamental precept by suggesting that the positron was a real possibility. In
the present case, scientists believed--indeed, most of them wanted to believe--the power
elite's quasi-religious dogma of 'value-free (social) science'. It followed that there could
not be a science of evaluation, indeed any discipline of evaluation. Of course, everyone
knew there was a practice of evaluation, since every one of them as a student had received
grades on their school work and virtually every one of them had given grades to
students--presumably well-justified, factually based grades. People working in testing or
program evaluation realized there were plenty of wicks of the trade, enough to justify a text
on the subject--but it never occurred to them to see that subject as part of a general
discipline, or to use less than the general term to describe their own work, although their
common sense was perfectly well aware that there were half a dozen other applied fields of
evaluation.
Despite the prima facie absurdity of thinking that many fields could be engaged in
easily justified practices which obviously shared many common concepts--ranking,
grading, bias, evaluative validity, etc.--if in fact value judgments were completely
unscientific, the paradigm persisted. It prevented scientists from trying to generalize their
evaluative results to other parts of their own domain, let alone considering the possibility
of a common logic, methodology, and theory that transcended domains. In fact the
paradigm prevented them from trying to study the other fields to see if there were some
practices there from which they could learn. As a result the wheel was reinvented many
times, or, worse, not reinvented.
Instead workers in each field made a point of decrying any suggestions of
similarity. People in personnel evaluation often rejected the idea that they could learn from
the quite sophisticated and much older field of product evaluation, often with some great
insight like: "We can hardly learn from product evaluation, since people aren't products."
One might as well say that cognitive psychology can't learn from computer science since
people aren't computers. The difference in subject matter is undeniable but irrelevant to the
existence of useful analogies and some common logic and methodology.
Had the thought of a general discipline occurred to these writers, they would of
course have made some mention of it in the introduction to their books, or used a less
misleading rifle. But such a thought was not acceptable and such mentions never occurred.
That doesn't mean they thought it but didn't say it; it means they didn't think it. Their
perceptions and thinking were controlled by the paradigm.
We've talked about what it takes to constitute a discipline. Now, what is this
subject of evaluation that we are talking of making into a discipline? The term "evaluation"
is not used here in any technical sense: we follow common sense and the dictionary.
152 M Scnven
Evaluation is simply the process of determining the merit or worth of entities, and
evaluations are the product of that process. Evaluation is an essential ingredient in every
practical activity--where it is used to distinguish between the best or better things to make,
get, or do, and less good alternatives--and in every discipline, where it distinguishes
between good practice and bad, good investigatory designs and less good ones, good
interpretations or theories and weaker ones, and so on. It can be done arbitrarily, as by
most wine 'experts' and art critics, or it can be done conscientiously, objectively, and
accurately, as (sometimes) by trained graders of English essays in the state-wide testing
programs. If done arbitrarily in fields where it can be done seriously, then the field
suffers, and the work of all those in the field suffers. For if we cannot distinguish between
good and bad practice, we can never improve practice. We would never have moved out of
the stone age, or even within the stone age from Paleolithic to Neolithic.
It was only because these views were filling a perceived vacuum that they were
generally put forward as theories of evaluation. In fact, they were only theories of program
evaluation. Indeed, they had an even narrower purview. For "program evaluation" has
become a label for only part of what is actually required to do program evaluation, just as
"needs assessment" has in some quarters become a name for a formalized approach that
covers only part of what is required in order to determine needs. In the real world, pro-
gram evaluation always involves some personnel evaluation, should nearly always involve
some evaluation of management systems and some ethical evaluation, and should usually
involve some product evaluation. It will also often benefit from some consideration of pro-
posal evaluation and the evaluation of evaluations. But we'll leave out all these refinements
in this brief overview, and focus on what is conventionally called program evaluation.
Evaluatton as a DIsclphne 157
The following simplified classification16 begins by identifying six views or ap-
proaches that are alternatives to and predecessors of the one advocated here, the
transdisciplinary view. They are listed below in the order of their emergence into a position
of power in the field of program evaluation since the mid-sixties when the explosive phase
in that field began. In addition to those discussed here there is a range of exotica--
fascinating and illuminating models ranging from the jurisprudential model to the connois-
seurship model--which we pass over for reasons of space.
A. The 'strong decision support' view was an explication of the use of program
evaluation as part of the process of rational program management. This process, implicit in
management practice for millenia, has two versions. The strong version described in this
paragraph conceived of evaluators as doing investigations aimed to arrive at evaluative
conclusions designed to assist the decision-maker. Supporters of this approach pay
considerable attention to whether programs reach their goals, but go beyond that into
questions about whether the goals match the needs they are supposedly addressing,
thereby differentiating themselves from the much narrower relativistic approach listed here
as approach C. Position A was exemplified in, but not made explicit by the work of Ralph
Tyler, 17 and extensively elaborated in the CIPP model of evaluation (Context, Input,
Process, and Product) (Stufflebeam, et al., 1971). The CIPP model goes beyond the
rhetoric of decision support into spelling out a useful systematic approach covering most
of what is involved in program evaluation, and uses this to infer evaluative conclusions.
Dan Stufflebeam, who co-authored the CIPP model, has continued to play a leading role in
evaluation, still representing--and further developing--this perspective. By contrast,
Egon Guba, one of his co-authors in the early CIPP work, has now gone in a quite
different direction--see F below. This approach, although this particular conclusion was
more implicit than explicit, clearly rejected the ban on evaluation as a systematic and
scientific process. It was not long, however, before recidivism set in, as we see in the next
four accounts.
B. The 'weak decision support' view. The preceding approach has often been
described as the 'decision support' approach but there is another approach which also
claims that title. It holds that decision support provides decision-relevant data but stops
short of drawing evaluative conclusions or critiquing program goals. This point of view is
represented by evaluation theorists such as Marv Alkin who define evaluation as factual
data gathering in the service of a decision-maker who is to draw all evaluative
conclusions. 18 This position is obviously popular amongst those who think that true
science cannot or should not make value judgments, and it is just the first of several that
found a way to do what they called program evaluation although while managing to avoid
actually drawing evaluative conclusions. The next position is somewhat more like
evaluation as we normally think of it, although it still manages to avoid drawing evaluative
conclusions. This is:
C. The 'relativistic' view. This was the view that evaluation should be done
by using the client's values as a framework, without any judgment by the evaluator about
those values or any reference to other values. The most widely used text in evaluation is
written by two social scientists and essentially represents this approach (Rossi &
Freedman, 1989). B and C were the vehicles that allowed social scientists to join the
program evaluation bandwagon. 19 The simplest form of this approach was developed into
the 'discrepancy model' of program evaluation by Malcolm Provus (the discrepancies
being divergences from the projected task sequence and timeline for the project). Program
158 M Scnven
monitoring as it is often done comes very close to the discrepancy model. This is a long
way from true program evaluation for reasons summarized below. It is best thought of as a
kind of simulation of an evaluation: as in a simulation of a political crisis, the person
staging the simulation is not, in that role, drawing any evaluative conclusions. Of course,
it's a little more quaint for someone who is not drawing any evaluative conclusions to refer
to themselves as an evaluator.
D. The 'rich description' approach. This is the view that evaluation can be done
as a kind of ethnographic or journalistic enterprise, in which the evaluators report what
they see without trying to make evaluative statements or infer to evaluative conclusions--
not even in terms of the client's values (as the relativist can). This view has been very
widely supported--by Bob Stake, the North Dakota School, many of the UK theorists,
and others. It's a kind of naturalistic version of B; it usually has a flavor of relativism
about it, reminiscent of C--in that it eschews any evaluative position; and it sometimes
looks like a precursor of the constructivist approach described under F below, in that it
focuses on the observable rather than the inferrable. More recently, it has been referred to
as the 'thick description' approach--perhaps because "rich" sounds evaluative?
E. The 'social process' school. This was crystallized about 12 years ago,
approximately half way to the present moment in the history of the emerging discipline, by
a group of Stanford academics led by Lee Cronbach, referred to here as C&C (for
Cronbach and Colleagues; Cronbach et al. 1980). It is notable for its denial of the impor-
tance of summative evaluation, i.e., evaluation (i) as providing support for external
decisions about programs, or (ii) to ensure accountability. The substitute they proposed for
evaluating programs in anything like the ordinary sense was understanding social pro-
grams, 20 flavored with a dash of helping them to improve. Their position was encapsulated
in a set of 95 theses. This paper may perhaps represent an implementation of the 87th in
their list, which states: "There is need for exchanges [about evaluation] more energetic than
the typical academic discussion and more responsible than debate among partisans"--if
indeed there is any such middle ground.
Ernie House, a highly independent thinker about evaluation as well as an
experienced evaluator, also stressed the importance of the social ambience but was quite
distinctive in his stress on the ethical and argumentation dimensions of evaluation. In fact
his stress on the ethical dimension was partly intended as a counterpoint to the absence of
this concern in C&C (House, 1989).
F. The 'constructivist' or 'fourth generation' approach, representing the most re-
cent riders on the front of the wave, notably Egon Guba and Yvonna Lincoln (1989), but
with many other supporters including a strong following in the USA and amongst UK
evaluators. This point of view rejects evaluation as a search for quality, merit, worth, etc.,
in favor of the idea that itmand all truth, such as it is in their termsmis the result of
construction by individuals and negotiation by groups. This means that scientific
knowledge of all kinds is suspect, entirely challengeable, in no way objective. So, too, is
all analytic work such as philosophical analysis, including their own position. Out goes the
baby with the bathwater. Guba has always been aware of the potential for self-
contradiction in this position; in fact, there is no way around its suicidal bent.
Evaluation as a Discipline 159
Comments
Now, the commonsensical view of program evaluation is probably the view that it
consists in "working out whether the program is any good". It's the counterpart, people
might say, of the sort of thing doctors, road-testers, engineers, and personnel interviewers
do, but with the subject matter being programs instead of patients, cars, structures, or
applicants. The results of this kind of investigation are of course direct evaluative
conclusions--"The patient/program has improved slightly under the new therapeutic/-
managerial regime", etc. Of the views listed above, the slxong decision support view, of
which CIPP is the best known elaboration, comes closest to this.
The CIPP model was originally a little overgeneralized in that it claimed all
(program) evaluation was oriented to decision support. It seems implausible to insist that a
historian's evaluation of the "bread and circuses" programs of Roman emperors, or even
of the WPA, is or should be designed to serve some contemporary decision maker rather
than the professional interest of historians and others concerned with the truth about the
past. One must surely recognize the 'research role' of evaluation, the search for truth about
merit and worth, whose only payoffs are insights. Much of the decision support kind of
evaluation, and all of the research type exemplify what is sometimes called summative
evaluation---evaluation of a whole program of the kind that is often essential for someone
outside the program. One might also argue, contra the original version of CIPP, that
formative evaluation----evaluation aimed at improving a program or performance, reported
back to the program staff--deserves recognition as having a significantly different role
than decision support and its importance slightly weakens the claim that evaluation is for
decision support. (Of course, it supports decisions about how to improve the program, but
that's not the kind of decision that decision support is normally supposed to support.)
Over the years, however, CIPP has developed so that it accepts these points and is a fully-
fledged account of program evaluation; and its senior author has gone on to lead research
in the field of personnel evaluation.
While CIPP remains an approach to program evaluation, it comes to conclusions
about program evaluation that are very like those entailed by the transdisciplinary model.
The differences are like those between two experienced navigators, each of them with their
own distinctive way of doing things, but each finishing up--or else how would they live
to be experienced?--with very similar conclusions. Of matters beyond program
evaluation, and in particular, of the logic and core theory of evaluation, CIPP does not
speak, and those are the matters on which the transdisciplinary view focuses above all
others.
The other entries in the list above--that is, almost all schools of thought in evalua-
tion---can be seen as a series of attempts to avoid direct statements about the merit or worth
of things. Position B avoids all evaluative conclusions; C avoids direct evaluative claims in
favor of relativistic ones; 21 D avoids them in favor of non-evaluative description; E avoids
them in favor of insights about or understanding of social phenomena; and F rejects their
legitimacy along with that of all other claims. ~
This resistance to the commonsense view of program evaluation--even amongst
those working in the field--has its philosophical roots in the value-free conception of the
social sciences, discussed above, but it also gathered support from another argument,
which appears at first sight to be well-based in common sense. This was the argument that
160 M. Scnven
the decision whether a program is desirable or not should be made by policy-makers, not
by evaluators. On this view it would be presumptuous for program evaluators to act as if it
were their job to decide whether the program they were called in to evaluate should exist.
That argument confuses evaluations--which evaluators should produce--with
recommendations, which they are less often in a good position to produce (although they
often do produce them), and which are frequently best left to executives close to the
political realities of the decision ambience. That such a confusion exists is further evidence
of the lack of clarity about fundamental concepts in the general evaluation vocabulary.
Evaluators have all too often overstepped the boundaries of their expertise and walked on
to the turf of the decision maker, who rightly objects. But it is not necessary to react to the
extent of the weak decision support position and others that draw the line too early, cutting
the evaluator off even from drawing evaluative conclusions.
The issue must now be addressed of how the view supported in this paper, referred
to as the 'transdisciplinary' view, compares with the above. The transdisciplinary view
extends the commonsense view but is significantly different from A, and radically different
from all the rest.
On this view, the discipline of evaluation has two components: the set of applied
evaluation fields, and the core discipline, just as statistics and measurement have these two
components. The applied fields are like other applied fields in their goals, namely to solve
practical problems. This means finding out something about what they study, and what
they study is the merit and worth of various entities--personnel, products, etc. The core
discipline is aimed to find out something about the concepts, methodologies, models,
tools, etc. used in the applied fields of evaluation, and in other fields which use evaluation.
This, as we have suggested, includes all other disciplines---craft and physical as well as
academic. Hence the transdiscipline of evaluation is concerned with the analysis and
improvement of a process that extends across the disciplines, giving rise to the term.
Consider statistics more closely. There is a core discipline, studied in the
department of mathematics or in its own academic department. This is connected to the
applied fields of, for example, biostatistics, statistical mechanics, and demographics. The
applied fields' main tasks are the study and description of certain quantitative aspects of the
phenomena in those fields, and the study and development of field-specific quantitative
tools for describing that data and solving problems on which it can be brought to bear. The
more general results coming from the core discipline apply across all the disciplines that
are using----or should be rising--statistics, hence the term "transdiscipline"; but it also
helps develop field-specific techniques, attending in particular to the soundness of their
fundamental assumptions and hence the limits of their proper use.
Both evaluation and statistics are of course widely used outside their recognized
applied fields, i.e., the ones with "evaluation" or "statistics" in their rifle. That wider use is
part of the subject matter of the core discipline in both cases. Statistics must consider the
use of statistics wherever it is used, not just in areas that have that word in their title.
Looking at other Ixansdisciplines, logic has its own applied fields--the logic of the social
sciences, etc.--and is of course widely used outside those named fields. So it is an
extremely general transdiscipline. But evaluation is probably the most general--unlike
Evaluation as a Discipline 161
logic, it precedes language--and both are much more general than measurement or
statistics.
The transdisciplinary view of evaluation has four characteristics that distinguish it
from B-F on the previous list; one epistemological, one political, one concerning
disciplinary scope, and one methodological.
(I) It is an objectivist view of evaluation, like A. It argues for the idea that the evaluator
is determining the merit or worth of, for example, programs, personnel or products; that
these are real although logically complex properties of everyday things embedded in a
complex relevant context; and that an acceptable degree of objectivity and comprehen-
siveness in the quest to determine these properties is possible, frequently attained, and a
goal which can be more frequently attained if we study the transdiscipline. This contrasts
with B-F for obvious reasons. (There is some contrast with the early form of A, in the
shift of the primary role from decision-serving to troth-seeking.)
Since an objecfivist position implies that it is part of the evaluator's job to draw di-
rect evaluative conclusions about whatever is being evaluated (e.g., programs), the
position requires a head-on attack on the two grounds for avoiding such conclusions. So
the transdisciplinary position:
(i) Explicitly states and defends a logic of inferring evaluative conclusions from
factual and definitional premises; and
(ii) Spells out the fallacies in the arguments for the value-free doctrine33
(II) Second, the approach here is a consumer-oriented view rather than a management-
oriented (or mediator-oriented, or therapist-oriented) approach to program evaluation--and
correspondingly to personnel and product evaluation, etc. This does not mean it is a
consumer-advocacy approach in the sense that 'consumerism' sometimes represents--that
is, an approach which only fights for one side in an ancient struggle. It simply regards the
consumer's welfare as the primary justification for having a program, and accords that
welfare the same primacy in the evaluation. That means it rejects 'decision support'-
which is support of management decisions--as the main function of evaluation (by
contrast with B), although it aims to provide (management-)decision support as a
byproduct. Instead, it regards the main function of an applied evaluation field to be the
determination of the merit and worth of programs (etc.) in terms of how effectively and
efficiently they are serving those they impact, particularly those receiving---or who should
be receiving--the services the programs provide, and those who pay for the program--
typically, taxpayers or their representatives. While it is perfectly appropriate for the welfare
of program staff to also receive some weighting, schools---for example----do not exist
primarily as employment centers for teachers, so staff welfare (within the constraints of
justice) cannot be treated as of comparable importance to the educational welfare of the
students.
To the extent that managers take service to the consumer to be their primary goal--
as they normally should if managing programs in the public or philanthropic sectorm
information about the merit or worth of programs will be valuable information for
management decision making (the interest of the two views that stress decision support);
and to the extent that the goals of a program reflect the actual needs of consumers, this
information will approximate feedback about how well the program is meeting its goals
(the relativist's concern). But neither of these conditions is treated as a presupposition of
an evaluation; they must be investigated and are often violated.
162 M Scnven
The consumer orientation of this approach moves us one step beyond establishing the
legitimacy of drawing evaluative conclusions--Point I above--in that it argues for the
necessity of doing so---in most cases. That is, it categorizes any approach as incomplete
(fragmentary, unconsummated) if it stops short of drawing evaluative conclusions. The
practical demonstration of the feasibility and utility of going the extra step lies in every
issue of Consumer Reports: The things being evaluated are ranked and graded in a
systematic way, so one can see which are the best of the bunch (ranking) and whether the
best are safe, a good buy, etc. (grading), the two crucial requirements for decision-
making.
ON) Third, the approach here is a generalized view. It is not just a general view, it
involves generahzing the concepts of evaluation across the whole range of human
knowledge and practice. So, unlike any of the views A-F, it treats program evaluation as
merely one of many applied areas within an overarching discipline of evaluation. (These
applied areas may also be part of the subject matter of a primary discipline: personnel
evaluation, for example, is part of (industrial/organizational) psychology, biostatistics is,
in a sense, part of biology.) This perspective leads to substantial changes in the range of
considerations to which e.g., program evaluation, must pay attention (for instance, it must
look at other applied evaluation areas for parallels, and to a core discipline for theoretical
analyses), but helps with the added labors by greatly enhancing the methodological
repertoire of program evaluation.
Spelling out the directions of generalization in a little more detail, the
transdisciplinary view stresses:
(a) the large range of distinctive applied evaluation fields. The leading entries
are the Big Six plus meta-evaluation (the evaluation of evaluations). There
are at least a dozen more major entries, ranging from technology
assessment to ethical analysis.
(b) the large range of evaluative processes infelds other than applied evaluation
fields, including all the disciplines (the intradisciplinary evaluation process--
the evaluation of methodologies, data, instruments, research, theories, etc.)
and the practical and performing arts (the evaluation of craft skills,
compositions, competitors, regimens, instructions, etc.)
(c) the large range of types of evaluative investigation, from practical levels of
evaluation (e.g., judging the utility of products or the quality of high dives in
the Olympic aquatic competition) through program evaluation in the field to
conceptual analysis (e.g., the evaluation of conceptual and theoretical
solutions to problems in the core discipline of evaluation).
(d) the overlap between the applied fields, something that is rarely recognized.
For example, methods from one field often solve problems in other fields,
yet 'program evaluation' as usually conceived does not include any reference
to personnel evaluation, proposal evaluation, or ethical evaluation, each of
which must be taken into account in a good proportion of program
evaluations.
(IV) The transdisciplinary view is a technical view. This has to be stated rather carefully,
because we need to distinguish between the fact that many evaluations, for example large
program evaluations, require considerable technical knowledge of methodologies from
other disciplines; and the fact that is being stressed here, that evaluation itself, over and
Evaluation as a Dtsctphne 163
above these 'auxiliary' methodologies, has its own technical methodology. That
methodology needs to be understood by anyone doing non-trivial evaluation in any field at
all. It involves matters such as the logic of synthesis and the differences between the
evaluation functions like grading, scoring, ranking, and apportioning. Not all evaluators
need to know anything about social science methodologies such as survey techniques; all
must understand the core logic or risk serious errors. It has been common for those
working in and teaching others program evaluation to stress the need for skills in
instrument design, cost analysis, etc. But they have commonly supposed that such matters
exhausted the range of technical skills required of the evaluator. On the contrary, they are
the less important of two groups of those skills.
Stressing this does not minimize the fact that across the whole range of evaluation
fields, an immense number of 'auxiliary' methodologies are needed, far more than with
any of the other transdisciplines. There are more than a dozen auxiliary methodologies
involved in even the one applied field of program evaluation, more than half of them not
covered in any normal doctoral training program in any single discipline such as sociology,
psychology, law, or accounting.
Conclusion
Program evaluation treated in isolation can be seen in the ways all six positions
advocate. But program evaluation treated as just one more application of the logical core
which leads us to solid evaluative results in product evaluation, performance assessment,
and half a dozen other applied fields of evaluation, can hardly be seen as consistent with
the flight from direct evaluative conclusions that five of those positions embody.
While there are special features of program evaluation which often make it less
straightforward than the simpler kind of product evaluation, the reverse is often the case.
The view that it is different from all product evaluation is only popular amongst those who
know little about product evaluation. For example, the idea that program evaluation is of
its nature much more political than product evaluation is common but wrong; the history of
the interstate highway system and the superconducting supercollider are counter-examples,
and it was after all 'only' a product evaluation----one commissioned by Congress and done
flawlessly--that led to the dismissal of the Director of the National Bureau of Standards. p
One must conclude that the non-evaluative models of program evaluation discussed
here (B-F) are completely implausible as models for all kinds of evaluation. And it is
extremely implausible to suppose that program evaluation is essentially different from
every other kind of evaluation. The transdisciplinary view, on the other hand, applies
equally to all kinds of evaluation, and that consistency must surely be seen as one of its ap-
peals. For the various evaluative purposes addressed by the authors of the other papers in
this issue, it may also be of some value to see what they are doing as part of a single,
larger, enterprise, and hence as parallel to what workers in other applied fields of
evaluation are doing. In that perception there is a prospect of many valuable results which
should serve to revitalize several areas and sub-areas of evaluation. And the second edge
of the transdisciplinary sword cannot be ignored: the demonstration of fundamental errors
in applied evaluation fields such as personnel evaluation and program evaluation due to the
neglect of the core discipline.
164 M Scfiven
Notes
1 The author welcomes comments and critacisms of all kinds; they should be addressed to him at P.O.
Box 69, Point Reyes, CA 94956 (scriven@aol.com on the Intemet), or faxed to (415) 663-1511.
The reflectaons reported here were produced while working part-time on the CREATE staff, although
mostly on my own tame since my work for CREATE ~s primarily concerned with the specifics of
teacher evaluation. However, even when working on the specific topic, there is a need to examine
foundations in order to deal with questions of validity, and some remarks about the connection are
included here.
2. CREATE works mainly on personnel policy and program evaluation, and on institutional evaluation
which combines program and personnel evaluation. It also does considerable meta-evaluation.
3. Closely analogous to the amateur status of applied mathematicians about matters in the foundations of
mathematics, and not unlike the status of a bookmaker with respect to probability theory. A high
degree of skill in an apphed field does not automatically generate any skill in the theory of the field, let
alone meta-fields such as the sociology or history of the subject, or the logical analysis of
propositaons in the field.
4 The discussion here is only intended to prowde a brisk overview of this techmcal area. Further details
and references will be found in the relevant amcles in An Evaluatton Thesaurus (1991).
5. Since ff constructaonism were true, its arguments would prove that the claim that it is true has no
validity for those who do not construct reality in the same way as those who think it true, i.e., those
who disagree with it. That is, it is no more true than its denial, which means it is not true in the only
sense of that term that ever made it to the dictaonaries or into logical or scientafic usage.
6. By contrast wRh a vocabulary of standard signs, lacking grammar and hence recombination
capabtlity.
7. The defimtave reference is The employment intervtew" theory, research, and practice, Eder and Ferris
(1989).
8. The term "indicator" is here used to refer to a factor that ~s not a criterion (i.e., not one of the defining
components of the job). Hence good simulations----e.g., the classical typing test for selecting
typists--are exempt from these remarks, which apply only to 'empirically validated' indicators such
as performance on proprietary tests or demographic variables.
9. It is discussed at greater length in the writer's contribution to Research-Based Teacher Evaluation
(1990). It is still denied by leading specialists in personnel evaluation, many of whose standard
procedures are threatened by it (the use of 'empirically validated' tests).
10. There are some cases where the contribution has been and may be significantly different from zero.
For example, when the theory wolates a paradigm, a specialist in the evaluation of paradigms--
someone with a background in both history and philosophy of science - - m a y be able to contribute a
useful perspectwe or analogy.
11. E.g., they will include Excellent, Above Average, Average, Below Average, Unacceptable. Since the
average performance may be excellent, or unacceptable, this fails to meet the minimum requirement
for a scale (exclusive categories). The converse error is a scale like this: Outstanding, Good,
Acceptable, Weak, Weakest. 'Grading on the curve' is another good example of total category
confusion, for well-known reasons.
12. Many other examples are given m ET4.
13. By early next year, the present author hopes to do this in a monograph for the Sage Social Science
Methodology Series called General Evaluation Methodology.
Eva/uatlon as a Dtsctpline 165
14. The most obvious ~s that the standard procedure, which allocates say 100 points across half a dozen
dimensions of ment, ignores the existence of absolute mimma on some of the scales. This means---to
gave an extreme example that a proposal which happens to be on the wrong topic, but which is
staffed by great staff from a great institution at a modest price, could m principle wm a competitive
grant by picking up enough points on several of the dimensions to overcome its zero on relevance.
15. Some of this section is an improved version of parts of a much longer article, "Hard-Won Lessons m
Program Evaluation" m the June, 1993, issue of New Directions in Program Evaluation (Jossey-
Bass).
16. Ttus is an improved version of a classification that appeared in Scriven (1993).
17. Although he is often wrongly thought of as never questiomng program goals.
18. Alkin recently reviewed his original definition of evaluation after 21 years, and still could not bring
himself to include any reference to merit, worth, or value. He defines it as the collection and presenta-
tion of data summaries for dec~smn-makers, which is of course the definition of MIS (management
reformation systems). See pp. 93-96, in Allon (1991).
19 By contrast, Position A was put forward by educational researchers, who were less committed to the
paradigm of value-free social science, possibly because their disciphne includes history and phi-
losophy of educataon, comparative education, and educational administration, which have quite dif-
ferent paradigms.
20 This attempt to replace evaluataon with explanation is reminiscent of the last stand of psychotherapists
faced with the put up or shut up attitude of those doing outcome studies m the 1950s and 1960s. The
therapists, notably the psychoanalysts, tried to replace reme&ation with explanation, arguing that the
payoff from psychotherapy was improved understanding by the pauents of their cond~uons, rather
than symptom reduction. This was not a popular view amongst patients who were in pare and paying
heavily to reduce it--they thought.
21. A relativtstic evaluatave statement is something like: "If you value so-and-so, then this will be a good
program for you" or "The program was very successful in meetang its goals", or "If technology
education should be accessible to girls as easily as boys, then this program will help bnng that about"
These claims---of course, these are simple examples--express an evaluative conclusion only relative
to the client's or the consumer's values. A direct evaluauve claim, by contrast, wtule it can be
'relatavistic' m another sense--that ~s, comparative or condlt~onal--will contain an evaluative claim by
tts author, about the program under evaluataon. For example: "This program is not cost-effective com-
pared to the use of traditional methods" or "This is the best of the optaons" or "These s~de-effects are
extremely unfortunate".
22. The conno~sseurship model also weakens the evaluative component m evaluation, reducing ~t to the
largely subjectave model of a connoisseur's judgments. The connoisseur is highly knowledgeable but
the knowledge is in a domain where it only changes but does not validate its owner's evaluations.
23 These points are covered m some detarl in the Evaluation Thesaurus entries on the logic of evaluation
and not repeated here since the arguments are of rather specialized interest, although the issue is of
crucial ~mportance.
24. In the re_famous Astin case, where the D~rector was asked to do a study of the effect of the battery
addiuve AD-X2 prior to governmental purchase of it for the vehicle fleet. The additive had no effect,
as was apparent from a simple control-group study of government vehicles, and reporting that result
cost Astin his job (although media pressure eventually got him reinstated). A look at the process of
evaluation for textbooks, and its polittcal ambience, provides what may be an even clearer example of
product evaluation as involwng the same pohtical dimensions as program evaluation.
166 M Scnven
References
Alkin, M. (1991). Evaluation theory development: II. In M. McLaughlin, & D. Phillips (Eds.),
Evaluanon and educanon at quarter century (pp. 9 I-112). Chicago: NSSEAJniversityof Chicago
Cronbach, L.J., Robinson Ambron, S., Dombusch, S.M., Hess, R.D., Homik, R.C., Phillips, D.C.,
Walker, D.F., Weiner, S.S. Towards reform in program evaluation" Aims, methods and
tnstituttonal arrangements San Francisco: Jossey-Bass.
Eder, R.W., & Ferns, G.R. (Eds.). (1989). The employment mterview Theory, research and practice
Newbury Park, Cahforma. Sage.
Guba, E., & Lincoln, Y. (1989). Fourth generation evaluation.. Newbury Park, California: Sage.
Ross1, & Freedman (2989). Evaluation" A systematic approach Newbury Park, Califorma: Sage.
Scriven, M. (1990). Can research-based teacher evaluation be saved? In Rachard L. Schwab (EA.),
Research-based teacher evaluation (pp. 12-32). Boston: Kluwer.
Scriven, M, (1993). Hard-won lessons in program evaluation. New Directions in Program Evaluation
(June).
Spector, P. (1992). Summated ranng scale construction. Newbury Park, California: Sage.
Stufflebeam, D.L., Foley, W.J., Gephart, W.J., Guba, E.G., Hammond, R.L., Mernman, H.O., &
Provus, M.M. (1971). Educational evaluanon and decision making. Itasca, IL: Peacock.
The Author
MICHAEL SCRIVEN has degrees in mathematics and philosophy from Melbourne and
Oxford, and has taught and published in those areas and in psychology, education,
computer science, jurisprudence, and technology studies. He was on the faculty of
UC/Berkeley for twelve years. He was the first president of what is now the American
Evaluation Association, founding editor of its journal (now called E v a l u a t i o n Practice),
and recipient of its Lazarsfeld Medal for contributions to evaluation theory.