Professional Documents
Culture Documents
Scriven1996.pdf The Theory Behind Practial Evaluation
Scriven1996.pdf The Theory Behind Practial Evaluation
Copyright @ 1996
SAGE Publications
London, Thousand Oaks
and New Delhi.
Vol 2(4): 393-404
Overview
Mathematicians operate with little concern for the issues in the subject known as the
’foundations of mathematics’ or ’metamathematics’ because it appears-although this
is disputed-only small parts of mathematics would be significantly affected by dif-
ferent resolutions of the foundational disputes. On the other hand, curriculum and
materials developers in school mathematics are highly dependent on theoretical assump-
tions about learning in general, and psychological development in the child. It is argued
here that the second situation is nearer the truth for program evaluation-and for any
other branch of evaluation. The second kind of model is extremely common, not
exceptional: other examples of practical professions with the same high dependence on
theory are psychotherapy, computer design, business economics, and management.
To provide a reality check for this discussion, frequent reference will be made to the
collection of 25 essays in A Handbook of Practical Program Evaluation (Wholey et al.,
1994), referred to hereafter as ’the Handbook’. It should be understood, however, that
this is in no way a review of the book, which deserves detailed analysis. In particular, I
base these remarks on general features of the book, and do not assess chapters, because
the latter would require much more detail.
The book has four sections: Evaluation Design (Quantitative and Qualitative
Approaches); Practical Data Collection Procedures; Practical Data Analysis; and
Planning and Managing Evaluation for Maximum Effectiveness. One or more of the
authors writes or co-authors five of the 25 chapters. As you will notice from the section
titles, a great deal of the content overlaps with social-science methodology. At this
393
394
395
personnel evaluation (a program is, after all, just the work of its personnel),
especially if you want to finish up making formative recommendations rather
than black-box summative judgments-but the term or topic of personnel
evaluation never shows up.
2. You cannot do evaluation across the range of federal or state programs without
being able to do product evaluation, because many of those programs are either
directly or incidentally concerned with the production of educational or other
products. However, there is no mention here of how to do product evaluation, or
where to find out how to do it. Is the assumption that any competent program
evaluator can do product evaluation? Interesting assumption-but one that needs
to be given some support, support which will of course require a little
understanding of product evaluation. (Given that the leading institutions for
doing consumer-product evaluation-the consumer associations-are methodo-
logically very confused, this is not a promising line [see Scriven, 1994].) Is the
assumption that one just needs to survey some consumers about the quality of the
product? That view is exactly as superficial as the view that program evaluation
only requires a survey of program recipients.
3. What about proposal evaluation, which is in a sense the evaluation of program
scenarios, i.e. programs, before they start? There is a good discussion of
evaluability here, but it does not address this question.
4. Nor is there any discussion of the evaluation of RFPs-requests for proposals.
(The same discussion of evaluability, under which some aspects of RFP
evaluation could be included, skips over this issue, which needs to be addressed
directly, since almost all proposal evaluation is done using an invalid model-
allocating 100 points across a set of criteria of merit.)
These last two areas (3) and (4) are ones where the evaluator can have a huge influence
on the avoidance of time- and money-wasting activities, and may even be able to get
some requirements in about baseline data-gathering, often essential to getting a
worthwhile evaluation. But there are other reasons why proposal and RFP evaluation
should come into a handbook for program evaluation. For example, many large
programs subcontract work, and the way they do it should be examined by the program
evaluator. Just as one must look to see if inequity is current in the appointment or
promotion of staff, as part of judging the management of a program, so one must look
to see if invalidity is present in the selection of subcontractors.
Apart from cutting these other fields out of program evaluation, a decision that is at
least damaging and sometimes almost fatal, the lack of any overview of evaluation has
another bad effect. It narrows the field of view of the program evaluator in a way that
forces her or him to reinvent many wheels that were long ago constructed and used in
other fields of evaluation. For example, personnel evaluation long ago worked out valid
ways to handle a method for scoring candidates which transfers directly to the
integration of subdimensions of a program evaluation, or to the evaluation of proposals.
It avoids one fatal flaw in the standard federal method for doing the latter-namely the
failure to require minimum scores on some or (usually) all of those criteria.
396
397
In the next four sections, this list is significantly extended, and some reference is made
to other examples of the above. In the final section, we go on to some positive
recommendations.
398
Measurement
Again, this gets no direct reference in the index or the table of contents, and the
mention of performance indicators is only in the context of evaluability assessment, not
program evaluation. (There is some discussion of performance measures in the chapter
on outcome monitoring.) One’s general impression is that practical evaluation is
Synthesis
Here this is taken-for example, but not only, in a valuable chapter on ’Synthesizing
Research Findings’-to mean meta-analysis, i.e. the integration of research studies,
rather than the integration of subevaluations, or of performance data, on multiple
dimensions. We can call the former ’external synthesis’-the items being integrated
are standalone studies with a common topic-and the latter ’internal synthesis’-the
items being integrated are the criteria of merit within a single evaluative investigation.
But the latter is a key element in practical evaluation, and in that very chapter there is
399
This omission is consistent with the editors’ own definition of program evaluation (p.
3), which makes clear that they are excluding the evaluation of process as part of the
evaluation of programs. This seems so much at variance with present use-for
example, by the General Accounting Office, or auditors in general-as to suggest a
need for some explanation of the reasons for going this way. There are other problems
of this kind, for example, the definition of qualitative evaluation would be roundly
rejected by quantitative researchers, since it is highly prejudicial (p. 70).
In general, this is not an ideal way to define concepts. The overall point is simple
enough: conceptual clarity (or its lack) shows up in definitions, and comes from careful
thought at the conceptual level, where the foundations for theories are laid. Absent a
serious conceptual level, which undertakes to explore and explain the connections to
the practical level, practice becomes confused and confusing. Some other omissions
have already been mentioned. A serious one for a practical handbook is the matter of
generalizability (’external validity’, though this is a confusing term). A program is
often valuable because it shows the way, it can be exported; it’s an interesting question
how one shows this, and what dimensions of generalizability one should explore. This
connects with the question of the ’significance’ of programs, which goes beyond their
400
401
402
(j ) In all of these applied fields, the basic logic of evaluative reasoning, from
empirical and definitional or analytic data to evaluative conclusions, is the same. It is
not deduction and it is not statistical or quantitative probabilistic inference; although it
uses all of these at times, it is more general. It has a long history of validation in the law
as probative or prima facie inference, and in common sense (where it is signalled by
phraseology like ’other things being equal’ and ’on balance’) as well as informal
social-scientific reasoning. Competence in performing it is essential in evaluation, but
not one of the evaluation-specific tasks. Only very recently have developments in
informal logic begun to crystallize the structure of probative inference, so logic as a
discipline, although as general in its relevance as evaluation, lags in its development.
(Confusing probative inference with incomplete deductions or informal statistics was
another support for the rationalization for the value-free doctrine.)
(k) Many issues concerning types and styles of evaluation-formative or summa-
tive, participatory or distanced, goal-free or goal-based, etc.-are part of the practical
methodology, or the philosophy or sociology, of evaluation, or the context of it, or
represent personal preferences in doing it or some part of it or something related to it
(teaching evaluation, consulting about evaluation, gathering data to support an
evaluation, etc.). Disagreements about them, or about evaluation-related but not
evaluation-specific matters such as the use of qualitative or quantitative approaches,
should not be taken to bear on the question of whether one can establish the existence
and nature of a core discipline, as we have tried to do here.
Notes
1. In fact, the only reference to theory of any kind is one reference to program theory,
i.e. a theory of the operation of the program being evaluated. Should there be more
references to program theory-is a knowledge of program theory a prerequisite for
good program evaluation? The answer to that question requires a conception of
evaluation, and some understanding of the logic of evaluation. It is not addressed in
the Handbook, but it is a fundamental question about practical program evaluation.
2. It is possible that the fact that the senior author of the Handbook got his doctorate in
philosophy provided him with the intellectual fortitude to dismiss the demands of
the constructivists, as it did for this reviewer. The claim here is, however, that the
good sense of avoiding their fate of drowning in philosophical quicksand has to be
combined with retaining just enough spirit of philosophical inquiry to force us, as
practitioners, to look at our assumptions in a less suicidal way. In this respect, it
would appear that the Harvard brand of philosophy leaves its graduates with less
love for the abstract than its Oxford counterpart. (Perhaps also with less love for
generalizations based on n 2.) =
References
Scriven, Michael (1994) ’Product Evaluation-The State of the Art’, Evaluation Practice 15(1):
45-62.
Scriven, Michael (forthcoming) ’The Logic of Checklists’, Evaluation Practice.
Wholey, Joseph, Harry Hatry and Kathryn Newcomer, eds (1994) A Handbook of Practical
Program Evaluation. San Francisco, CA: Jossey-Bass.
404