You are on page 1of 22

Evaluation Research in Education

Originally prepared by Professor Harold Silver.

Component now led by Dr. Nick Pratt.

H Silver, Faculty of Education, University of Plymouth, 2004

(links reinstated August 2006)

CONTENTS

1 Questions

2 What is evaluation?

Definitions
Types of evaluation
o Process and impact
o Formative and summative

3 Research?

Different or the same?

4 Methods

5 Internal evaluation

6 The evaluator

Examples

7 Advice

8 References and further reading

9 Tasks
1 Questions

The questions to be addressed are interrelated and can be summarised as:

What is evaluation?

Is it research?

Why evaluate?

How and by whom?

Evaluation has become a widespread activity, internationally, under that name since the 1960s,
in a variety of contexts. Although we are focusing on education here, it is important to
remember that evaluation models have been developed elsewhere, notably in the social
sciences. It has been used to test the effectiveness of, for example, national and international
programmes in agriculture or crime prevention, health improvement or transport policy. A
postgraduate course in applied social science at Manchester University was introduced as
follows:

Increasingly, social service providers, programme administrators and legislators use evaluation
research in order to consider the effectiveness of new and existing programmes, procedures
and/or interventions at producing some form of outcome or change. The findings from
evaluations focus on the strengths and weaknesses of various aspects of innovations as well of
their overall outcome. This information is, in turn, used to consider how such interventions
might be modified, enhanced or even eliminated in the effort to provide a better service, fulfil a
particular need or meet a specific challenge.

In education evaluation has served a somewhat similar purpose, and has been applied to major
programmes of whole-school reform or specific curriculum changes, and more limited projects
to try out innovations. There is a vast literature on different types and purposes of evaluation,
and we shall sample some of it here as we address the priority questions. Some of the discussion
overlaps with issues discussed in other RESINED components such as action research,
qualitative research and interviewing, and the links will be highlighted as they occur.

2 What is evaluation?

Definitions

At the lowest level evaluation is a regular social activity, such as that conducted by Which?
magazine and other publications, and by ourselves. It makes comparisons amongst products or
services, with a view to making a selection a kitchen utensil or an investment, a car or a
Chardonnay. At this level evaluation is comparative on the basis of relatively straightforward
criteria and available information, and is a preliminary to decision-making. The criteria, of
course, are not the same for everyone evaluating comfort may or may not override the cost or
style of a car, and labelling may or may not influence choice of a wine. In education the purposes
and the criteria are inevitably more complex and evaluation is a process of acquiring
information. Evaluation of an innovation or an activity, a curriculum or organisational change,
raises a series of sometimes difficult or contentious issues. Who is sponsoring the evaluation,
what do they want to know, and why do they want to know it? What depends on the outcomes
more or less finance, promotion or redundancy? What is the salient issue for the evaluation
change in student learning, staff development, value for money, position in a league table ?
Whose opinion counts most students feedback in the university, the teachers perceptions in
the school, project managers, administrators?

Evaluation in education therefore encompasses competing criteria and purposes, and is situated
in potentially sensitive political and ethical contexts.

If you will be undertaking a 'task' at the end of this

component you may find it helpful to make some notes as you

go along. At this point you could make a preliminary list of

problems you think might be encountered in evaluating

a new initiative in your own institution.

It is important to note that evaluation research (a concept discussed below) is basically what is
commonly called programme or project evaluation. The features of such evaluation (in its
various forms) may be the same or similar at all levels of education, concern innovations,
initiatives and developments of many kinds, and it is mostly conducted by individual evaluators
or small teams. There are, however, other forms of evaluation that are not included in the
discussion here. These include, commonly in higher education, the evaluation of teaching quality
or of research or the evaluation of institutions, as part of a system approach to quality assurance
conducted by national agencies. Teaching quality and institutional evaluation may also be
conducted internally as a form of self-evaluation (eg Ellington and Ross, Evaluating teaching
quality throughout a university [Robert Gordon University], and Adelman and Alexander, The
Self-Evaluation Institution).

Definitions of evaluation can indicate the intentions involved, but are elusive as complete
explanations. The kind of definition that was often used in the 1950s and 1960s, notably in the
United States, was:

Evaluation is the systematic assessment of the worth or merit of some object.

The judgmental tenor of that definition in fact reflects the evaluation of cars of Chardonnays
assessing their worth or merit in order to choose, though it does not reflect the casual nature of
personal judgments that are often unsystematic. Subsequent attempts to define evaluation
have adapted this formulation. Trochim, in the United States, for example, suggests:

Evaluation is the systematic acquisition and assessment of information to provide useful


feedback about some object.

He explains the older and the revised versions, which both agree that evaluation is systematic
and use object to refer to a programme, policy, technology, person, need, activity and so on.
The revised definition, however, emphasizes acquiring and assessing information rather than
assessing worth or merit because all evaluation work involves collecting and sifting through
data, making judgements about the validity of the information and of inferences we derive from
it, whether or not an assessment of worth or merit results (Trochim, website). Whether
evaluation makes judgments or is a preliminary to other people making judgments, is a
contentious issue in the field (and is discussed further below). The former definition, assessing
worth or merit, inescapably involves acquiring and assessing information, but the revised
version does focus on the information. It suggests that assessing worth depends on an analytical
approach to information, that is, on an understanding of the object about which feedback is
required.

Another, this time British, attempt at revising the first definition was in connection with the
evaluation of educational institutions. It defined such evaluation as involving:

the making of judgements about the worth and effectiveness of educational intentions,
processes and outcomes; about the relationships between these; and about the resource,
planning and implementation frameworks for such ventures. (Adelman and Alexander 1982, p.
5)

While retaining the notion of making judgments about worth, there are two important
extensions in this version. First, the object of study has acquired intentions, processes and
outcomes; it is a complex sequence in which the parts have relationships, and it is therefore
clear that evaluation is concerned in some way with that sequence. Second, this sequence is not
isolated. It is in a framework which has to do with resources, planning and implementation.
Evaluation therefore understands the sequence only by also taking account of the framework
in which the sequence takes place. The curriculum is in a classroom with its relationships, in a
school, and in a complex and interactive context involving families and communities, authorities
and the various levels of policy making - all of which affect what is taught and learned. Further
education colleges and universities have their own departmental, disciplinary, institutional and
other contexts that may have to be taken into account when a project or initiative is evaluated.

In considering evaluation in your institution are there possible

major issues concerning relationships in the context of management,

the whole institution, outside constituencies and agencies...?


If you were to conduct an external evaluation in an institution other

than your own, how different might the issues be from the ones

you have considered above.

Types of evaluation

With these preliminary considerations in mind, it would be helpful to look carefully at the
following and make some tentative choices regarding the role or roles that may seem most
appropriate in your evaluation of the project, programme, innovation or other initiative (for
simplicity sake we will encompass all of these from now on in the term project). The
evaluators role is to be:

as objective as possible (interviewing, questioning, reporting on findings, not being too

close to the participants) and to report to the person or body for whom the evaluation is

conducted;

to collect data rigorously and scientifically;

to feed back impressions to participants (so that they can take note of your findings and

improve their activities);

to understand and describe the project and make judgments;

to be involved with the project from the outset, working with the project participants to

plan their programme and the evaluation together;

to define the nature and methodology of the evaluation professionally, to begin work

when the project is operational and monitor it at agreed intervals and at the end;

to monitor the process, that is, the implementation of the initial terms of reference or

objectives of the project;

to focus on the life of the project in its relevant wider contexts;

to investigate the outcomes, successful or unsuccessful, of the project;

to judge whether the project has been (or is likely to be) value for money;
to conduct an external evaluation and nothing more;

to help participants to conduct an internal evaluation, in addition to the formal external

one, or as a substitute for it;

Or..

It will be clear from the choices available that evaluation is far from being a simple or standard
activity. The choices are neither right nor wrong, but may be more appropriate to particular
programmes, conditions and requirements, and to the self-image of the evaluator. Evaluators
and evaluation theorists have extensively explored the alternatives and these have been the
focus of various kinds of controversy. To compare your own preferences or issues with some of
those in the literature in terms of types of evaluation click here. We cannot here consider all of
these alternative approaches, but it is important to emphasise two that are frequently met in
the evaluation literature.

Process and impact

The purposes of evaluation can be encapsulated in these two terms, the former to highlight
what is and has been happening, the latter to attempt to indicate what has happened as a
result. Both encounter difficulties.

Process evaluation is targeted on implementation, how the programmes intentions are


being interpreted and the experience of conducting the activity, together with the
continuing or changing perceptions of the various constituencies involved. The kinds of
questions that such evaluation raises may include conflicts in these perceptions for
reasons not necessarily connected with the activity itself, confusion about the original
terms of reference or doubts about their wisdom. The larger the programme the more
difficult is the question of sampling (how many people to interview and how to select,
what activities to attend) and when it is reasonable to monitor what is taking place.
For an external evaluator there may be problems of time allocation and frequency of
involvement, depending on the nature and extent of the programme (multi-site,
national), though even with a small, single-institution activity initial decisions about
the extent of the external evaluators involvement may cause problems. Often called
implementation evaluation, this often causes difficulties in collecting reliable
information on how successfully the implementation is taking place.

Impact (or outcomes, or sometimes product) evaluation raises some of these issues,
but also different ones. Would the outcomes of the programme have happened
without the intervention, and is there a credible causation between the activity and the
impact? Answers to the question of what impact has taken place may be positive,
negative or mixed, that is, an evaluation may be of non-success, evidence of non-impact
or of the complexities that have arisen from other factors for example, the result of
other interventions, processes and contexts. Impact may cover time scales that vary
considerably from programme to programme (eg a limited research/development
programme in a school or university, or a World Bank project over a nation or region).
Impact may be studied not only at the conclusion of an activity (or its funding) or after
an interval of time, but also during the activity especially if it is designed to provide
regular feedback or if it is a longitudinal study. Evaluations of the American Head Start
and similar programmes, for example, involved the evaluation of learning gains and
other measures in a variety of ways at intervals over very long periods. It is common for
evaluators of limited-time projects to feel (and suggest) that the real impact evaluation
could only take place several years after the end of the programme. Depending on the
project, impact evaluation may have policy or decision-making implications:

An impact evaluation assesses the changes in individuals well-being that can be attributed to a
particular program or policy. It is aimed at providing feedback and helping improve the
effectiveness of programs and policies. Impact evaluations are decision-making tools for
policymakers and make it possible for programs to be accountable to the public. (World Bank,
website)

Such a role for the evaluator raises questions, discussed below, of the kind of contract agreed at
the beginning of the evaluation, and the possible influence of the audiences for the reporting
procedure at the end. There are issues about the tentative or reliable nature of impact data,
which may differ considerably by type of project. Since a funding agency may require impact
data and an evaluator may find such data unattainable, there is room for misunderstanding and
conflict.

Formative and summative

These may be, but are not necessarily, related to the above.

Hopkins (as we saw above in terms of types of evaluation) made the simple suggestion that
formative evaluation was when the cook tasted the soup, and summative when the guest tasted
it. He also suggested that the difference was not so much when as why. What is the information
for, for further preparation and correction or for savouring and consumption? Both lead to
decision making, but toward different decisions (Hopkins 1989, p. 16). This latter distinction
establishes the difference between these concepts and those relating to process and impact.
Formative evaluation is designed to help the project, to confirm its directions, to influence or
help to change them. It is more than monitoring or scrutinising, it serves a positive feedback
function (which process evaluation does not necessarily do). Summative evaluation is not just
something that happens at the end of the project, it summarises the whole process, describes its
destination, and though it may have insights into impact, it is not concerned solely with impact.

Summative evaluation has often been associated with the identification of the preset objectives
and judgments as to their achievement (again, not necessarily in terms of impact). The
assumption in this case is that, unlike in formative modes, evaluation is not (should not be)
involved in changing the project in midstream otherwise the relationship between objectives
and their achievement cannot be evaluated:

every new curriculum, research project, or evaluation program starts with the specifications to
be met in terms of content and objectives and then develops instruments, sampling procedures,
a research design, and data analysis in terms of these specifications. (Bloom 1978, p. 69)

Starting specifications that are expected or required to be met therefore dictate the nature of
the summative evaluation. The instruments or sampling procedures cannot produce pure data
if the process is corrupted by the intervention of evaluator feedback or other alterations to the
original specifications. It is possible to conceive of evaluation as both formative and summative,
but in this case summative comes closer to meaning final, and cannot present data and make
judgments as purely as is suggested in Blooms definition.

Other approaches to evaluation emerged in the last quarter of the 20th century, and some will
be mentioned further below in relation to methodology. These have included illuminative,
democratic (as opposed to bureaucratic evaluation), participative and responsive
evaluation. These all have implications for the role of the evaluator in relation to the project, for
example, sharing with the project participants, responding to the activity not to specifications
and intentions, identifying and reporting differences of perspective and values, emphasising the
importance of understanding or recording competing perceptions. Much of this work relates to
discussion in other RESINED components, notably action research and case studies.

You could at this point consult the paper by Parlett and Hamilton on Evaluation

as illumination in Hamilton et al., Beyond the Numbers Game

(quoted in types of evaluation),

and other contributions to this influential book.

See also the chapter on Program evaluation: particularly responsive

evaluation by Robert Stake, in Dockrell and Hamilton, Rethinking Educational

Research, and Helen Simons, Getting to Know Schools in a Democracy:

the politics and process of evaluation.

3 Research?

We have so far by-passed discussion of the terms evaluation and evaluation research and
some difficulties inherent in this vocabulary, but also in conjunction with other terms sometimes
used in relation to evaluation including applied research and academic research. Jamieson
suggests that there are basic differences between the last of these and evaluation research, in
the degree of constraint on their purpose and operation, the funding and its implications,
publishing and reporting:

Evaluation reports and research reports not only have different audiences but their main
objectives are different. The goal of the research report is the enhancement of understanding
and knowledge via publication to the scientific community. The main goal of the evaluation
report is to inform and/or influence decision makers the relative emphasis of the two activities
must be different. (Jamieson 1984, pp. 72-3)

This seeks to establish one kind of distinction, but Jamieson also indiscriminately uses
evaluation and evaluation research in the argument. So is evaluation a form of research? The
question ultimately raises issues about the nature and definition of research as well as of
evaluation, and to approach these issues let us take some examples of discussion of the
relationship.

Different or the same?

For some commentators the distinction is between evaluation and research, ignoring any such
concept as evaluation research. The distinction drawn is generally between the methodology
of research based in the social sciences, and often directed towards answering questions
relating to policy, even to improving it. Evaluation particularly the more recent approaches to
evaluation is seen as serving a very different purpose. Parsons argues that if evaluation is seen
as serving the interests of decision makers then it has no right to claim the title of evaluation
it is then a form of research and should obey the rigorous rules of research, and it is then the
decision makers who are the real evaluators. He particularly excludes formative evaluation
from the definition of research:

Formative evaluators work alongside development or action research teams with the tasking of
feeding such teams with information that might help them modify their work, counter
weaknesses, anticipate problems and so on. The formative evaluator is an internal critic and
provides an information feedback service[Formative evaluation] serves a narrow audience, the
developers, and to be effective needs to be closely allied to or an integral part of the team. The
commitment thereby generated would make the formative evaluator suspect as the provider of
objective summative information of significance to a wider audience. (Parsons 1981, pp. 40-2)

This is a critique of claims for evaluation as research. Others, however, see the distinction as a
necessary and positive one. A crucial point in this argument is identified By MacDonald and
Walker:

The methodological difficulties faced by curriculum evaluators who want to offer a


comprehensive range of information about new programmes have drawn them to the case-
study as a technique. Many of the quite legitimate questions that are put to evaluators,
especially by teachers, cannot be answered by the experimental methods and numerical
analyses that constitute the instrumental repertoire of conventional educational research.
(MacDonald and Walker 1977, p. 181)

This argument refers to experimental methods and numerical analyses, but as conventional
research, itself under attack from case study and other (including action research) approaches
to research. There is, of course, no one way to conduct case studies research or action research,
but broadly speaking distinguishing evaluation from research involves also drawing a distinction
between them both and conventional forms of research.

Evaluation organisations themselves sometimes distance themselves from such social policy-
based versions of research. In American examples chosen earlier it may be difficult to judge in
what ways they are research. The Action Evaluation Research Institute defines its central
evaluation activity as

a new method of evaluation, one that focuses on defining, monitoring, and assessing success.
Rather than waiting until a project concludes, Action Evaluation supports project leaders,
funders, and participants as they collaboratively define and redefine success until it is achieved.
Because it is integrated into each step of a program and becomes part of an organization, Action
Evaluation can significantly enhance program design, effectiveness and outcome. (AERI [2000?],
website)

Explicitly, the approach is differentiated from traditional evaluation, and implicitly its purposes
and methodologies differentiate it from traditional research. The strategy may be based on
extensive research, but the strategy itself is difficult to define as research.

Click back on types of evaluation and judge whether you think the examples

can or cannot be described as research.

It can also be suggested that evaluation and research are the same or out of the same stable of
activities, not least by using the concept and title of evaluation research. An early American
attempt to consider the relationships between research and evaluation studies thought it
evident that many of the activities undertaken in evaluation and in research in education were
the same. In research itself it points out that a distinction is often drawn between applied
research and basic research on the basis of utility or simply new knowledge. Since evaluation
studies are made to provide a basis for making decisions about alternatives, questions of utility
are also addressed. This account sets out the range of possible differences between ideal
research and evaluation studies, and though the differences exist it concludes that they share
many characteristics of method and approach, they both add to new knowledge, stimulate and
benefit from the development of theory and contribute to a science of education. The essential
differences are not those of the evaluator and the researcher, but those of different kinds of
subsequent decision-makers:

The consequence of the differences between the proper function of evaluation studies and
research studies is not to be found in differences in the subject interest or in the methods of
inquiry of the researcher and of the evaluator. It is to be found in the manner in which the
outcomes of the two types of studies are used and regarded. (Hemphill 1969, pp. 189-92)
Studies is here simply a substitute for research. The defence of evaluation as either a form of
research or as part of the same family has continued to emphasise that the confusion has
related to stereotypes of both activities. Both have encountered debates about methodology,
including a case study approach; both have erected and torn down barriers round their
respective professional communities; both have faced problems about their relationship to
patrons, funders and audiences.

These debates about evaluation and/or research can be pursued

in chapters by Parsons (A policy for educational evaluation) and

Simons (Process evaluation in schools) in Lacey and Lawton

Issues in Evaluation and Accountability; in Stenhouses chapter

on The evaluation of curriculum in An Introduction to Curriculum

Research and Development, or in other items in the

bibliography below.

Is it simply a case of it all depends on what you mean by.?

4 Methods

Whatever the distinctions between academic research and evaluation research, the research
methods used are broadly similar though any given activity in either may use only a segment
of the methods available, and these will be overwhelmingly in the methods of qualitative
research. Across the range of evaluation approaches the methods will include interviews and
questionnaires, focus discussion groups and observation, case studies and diaries or logs. Some
of these are discussed in RESINED components on interviews, observation techniques and
questionnaires in education research. Some methods are used for particular evaluation
strategies and purposes.

In objectives-based and some other kinds of evaluation, for example, pre-test and post-test
strategies are likely to be used, in order to provide a baseline on which to make judgments
about how much has changed as a result of the project. An American approach to evaluating
whole school reform explains the strategy as follows:

This model makes the assumption that without the intervention, things will go on as they did
before. Other things being equal, teachers will continue to teach as they did before, and
students will continue to show the same pattern of achievement as they did before. With the
intervention, things will change over time, it is hoped in a positive wayThe model can include
repeated measures The pattern of change at different points in time can then be interpreted
as a result of the intervention. (North West Region Education Laboratory, website)

A health-related example goes into greater detail:

In order to determine how well the program is working to change those factors that cause social
problems, an evaluation needs to address these specifically. Often, this means focusing upon
how much behaviors of behavioural determinants (knowledge, attitudes, beliefs, skills, or
values) have changed from prior to the program intervention until sometime after it. The
questions answered in this type of evaluation refer to the programs goals and how well they
are being reached. For example:

o How much did participants knowledge of tobacco as an

addictive drug change due to the program?

o Have youth feelings of community empowerment increased

between the start and finish of the program?

o Are High School students less likely to engage in alcohol use

because of the program?

The most common way to answer these questions is through the use of pre and post surveys of
participants. This is not the only way to gather data on changes in behavior or behavioral
determinants However, the pre and post survey method can provide a good way to compare
participants before and after the program intervention. (Nebraska Council to Prevent Alcohol
and Drug Abuse, website)

The model makes assumptions about the possibility of outcomes occurring directly or uniquely
as a result of the intervention, and being susceptible to accurate measurement. The components
of such an approach include objectives, data collected by test surveys, rigorous adherence to an
impact model and though it is possible to collect change data during as well as at the end of
the project, relaying these data back to the project formatively would distort ability to measure
pre- and post- situations. As with Blooms description of summative evaluation, the problem is
that of ensuring the achievement of undistorted, uncontaminated data.

An evaluation method used primarily in higher education is that of student feedback, on a new
course or form of delivery, or regularly on the student experience. Of the formal methods of
obtaining feedback questionnaires are the most common. Students may be asked to give their
views on the curriculum and the teaching, the course management and assessment, and the
analysis of the questionnaire may be used as part of a broader process, as a basis for interviews
or group discussion. An alternative is some form of structured (or pyramid) group feedback, in
which the group is split into small and then larger groups, with agreed points being reached at
each stage, for presentation at the end in plenary discussion. The aim is to obtain feedback
without any person or group dominating the response. Nominal Group Technique has the same
purpose, but normally involves no discussion (except for item clarification), being based on
participants own written recording of their views, including nominating points for inclusion in
the report, and then a presentation by the session leader of the ideas expressed, with no
attempt to evaluate the suggestions. The NGT procedure aims at maximum objectivity of
feedback, and shares with other forms of structured feedback techniques an attempt to make
feedback representative. The purpose of regular feedback of these kinds is to inform the
teaching staff of the state of a course or the success or otherwise of an intervention and is
therefore a different kind of formative evaluation. The procedures can therefore also support
the evaluation of an initiative, and stand alongside other forms of evaluation of teaching or
curriculum change (for fuller details see ONeil and Pennington, Evaluating Teaching and
Courses from an Active Learning Perspective, pp. 21-34).

Although structures of the kinds outlined above are not typical of illuminative or similar
approaches it is important that some kind of structure is involved. Parlett and Hamilton, for
example, introduce illuminative evaluation as being in three characteristic phases: investigators
observe; inquire further; and then seek to explain. They give an example of how these three
stages took place and overlapped, and with this three-stage framework an information profile is
assembled using data collected from four areas: observation, interviews, questionnaires and
tests, documentary and background sources (Parlett and Hamilton 1976, pp. 14-15). For course
development, training or other initiatives the range of methods available is wide. For evaluation
generally, focus groups and questionnaires, interviews and implementation logs, feedback and
testing methods are part of a menu of approaches. One American list of evaluation tools for
projects involving interactive media contains 39 items, without including anything relating to an
ethnographic approach.

What all of these methods do is attempt to penetrate the complexities of social situations within
which evaluation of an initiative takes place, whether or not the evaluation is descriptive or
judgmental. Policy, project, documentation, process and outcome are not givens and the
evaluator relates to them in a variety of ways, using a variety of approaches. There is no
appropriate evaluation method, only the selection of an approach or approaches for a
particular situation, depending on the predominant assumptions of the evaluator, a project
team and the evaluation sponsors in some kind of understanding or negotiation. Determining
the strengths and weaknesses of a particular method is therefore itself an elusive process, and a
great deal of emphasis has to be laid on all the preliminary encounters and insights obtained
either at the beginning of the evaluation or, preferably, before the project and the evaluation
are launched. The evaluators statement of intent, terms of reference or contract therefore
needs to be clear about a number of agreed principles, though not necessarily, of course, in any
of the following vocabularies:

1. Prior clarification of the assumptions underpinning the project and the assumptions of

the evaluator. This could lead to identifying the style of evaluation that would be

appropriate whether based on objectives and measurement or a kind of joint

exploration between project and evaluation, whether autocratic and bureaucratic or


democratic. An important element in this clarification is identifying the source of

power for whom is the evaluation primarily intended and who can influence the

operation of the evaluation?

2. What is most wanted from an evaluation? Final calculation of whether the project has

been value-for-money? Ongoing feedback on implementation (is it working?), and

feedback to whom? What are the proposed outcomes? Are there already (eg in a

funding contract) defined objectives and expectations, possibly against existing baseline

data? Answers to such questions may determine whether to define the evaluation as

description and enlightenment, analysis and judgment.

3. The intended end product of all interventions is change. What kind of change is

anticipated, and how can the effectiveness of the intervention in producing it be either

measured or portrayed? Discussions of evaluation often focus on what does not produce

change. For example, there are views that self-evaluation by an institution is unlikely to

result in change; that testing instruments do not reveal what actually happens and what

produces the change; observation may reveal little of changes in teaching. The strengths

and weaknesses of a particular method may therefore be a function of its ability to

respond to underlying questions of any project: Who wants to know what, and why? -

and whether there is a method or methods that will offer something worthwhile.

5 Internal evaluation

The focus of the discussion here has been on external evaluation. The assumption has been that
evaluation with a research connotation is conducted by someone (or a team or group) external
to the project evaluated or to the institution. In the absence of, or in collaboration with, an
external evaluator some of the approaches and methodologies discussed above may also be
applicable to the internal (or 'self-') evaluation conducted by the project leader or team. One
task of an external evaluator may be to advise on internal evaluation methodology (interviews,
questionnaires) on an initial or ongoing basis. It is common for small projects or initiatives,
whether funded from within or outside the institution, to require an evaluation without making
provision for an external evaluation.

Some internal evaluation may consist solely of the collection of limited data (perhaps similar to
that undertaken to obtain student feedback in higher education, discussed above). Where a
single person is responsible for the project and has an audience or partnership (for example,
of students, colleagues, other professionals, patients) there may be a tendency to rely on
informal feedback or opinions from those with whom the project has occasional or regular
contact. This in no way meets the requirements of any of the versions of evaluation research
that we have considered. To answer the questions What do you know? and How do you know
it? something more systematic has to take place.

As with all evaluation and research there is a strong temptation to include, and in internal
evaluation to rely on, questionnaires as the source of data. Planning, conducting and analysing a
questionnaire are subject to difficulties and pitfalls (see RESINED component on questionnaires).
Interviews, particularly within the limited community that may be covered by a small project,
may be difficult to conduct. Structured discussion in small groups in what are described above as
focus groups or using a pyramid approach may be useful tools in these kinds of situations,
combining structure and focus for small group discussions. The use of logs or diaries by
participants in the project may be a valuable alternative or supplement to any of these
approaches.

As a small, internal initiative (though possibly one of a number of such initiatives) the institution
is unlikely to enter into a contract or agreement on evaluation in the same way as for an
external evaluator. What is needed, however, is agreement at the appropriate level for at least
initial contact by the project leader(s) or team(s)with a consultant who can discuss and advise
on the options available for internal project evaluation. The onus rests with the committee or
senior staff to ensure not just that evaluation is needed but also that such advice is available to
those conducting the project.

6 The evaluator

The conduct of an evaluation may be by a full-time professional evaluator, part-time by a


member of the same or another institution, by a team of two or three people or a much larger
team. The evaluation may be for a short period or for a number of years. The appointment of
the team may be by the institution with a great deal of input or very little input by those
conducting the project. The evaluator may be required to report to a steering group or
committee responsible for the project, to the institution, or to the funding body or to some or
all of these. The ground rules for the evaluation may be decided by the evaluator, or may pre-
exist the appointment.
On the last of these points, for example, the governments Department of Employment (as it
was in 1994) issued a document entitled Evaluating Development in Higher Education: a guide
for steering committees, contractors and project staff. The Employment Department, like all
government departments and others, did not just fund projects, it contracts with an institution
or organisation for a piece of project work, and the contract stipulated a series of requirements.
On the question of evaluation the Department indicated:

All this work requires evaluation. All the partners (individual staff, departments, project steering
groups, institutions and the Department among others) need to know whether it has been
successful and is worth imitating, what lessons there are for the future, and what further
development or research may be needed. Without this, resources, including scarce staff time
and energy, will be wasted in repeating mistakes and rediscovering what is already known.

For guidance the document set out 11 key questions, on such matters as the customers for the
work and the evaluation, the balance between formative and summative evaluation, the data
and the outcomes, baseline information. It suggested that steering groups might add their own
questions, concerning how the planned work was carried out, whether each objective was
achieved, future development and value for money. Although this was a guide, there were
issues that should always be addressed:

assessing contract compliance and value for money

contributing formatively to development

informing future agenda building and gathering intelligence

informing the review of development and evaluation methodology (Department of

Employment, click here for greater detail)

In this kind of case with such requirements built in at the contractual stage before the
appointment of an evaluator, the latter will enter a pre-determined situation, since the
institution or steering group will have committed themselves to a project and an evaluation
within this framework. The evaluator will have room for manoeuvre at the margins, mainly in
the selection of a methodology that will provide answers to the questions that have already
been formulated.

In other situations, of course, the steering group or project team will have only the broadest (if
any) guidance, and the evaluator possibly in order to secure the appointment will be asked
or volunteer to supply an evaluation brief setting out in appropriate detail what the evaluator
intends to do (style of evaluation, time commitment, ownership of the data and the evaluators
report(s), means of negotiation of any changes in evaluation procedure). Some situations may
require only an informal relationship generally for modest interventions without external
funding. In all cases, even in the most informal, a contract between the project management or
the institution and the evaluator is essential even if only to specify time, payment and any
requirements for example the date by which a final evaluation report has to be provided.
These preliminaries are necessary but only partially protect the evaluator. As one
commentator put it:

people who accept positions as evaluators place themselves in a vulnerable position: to put it
neatly the evaluator sets himself up for evaluationIn embarking on an evaluation the evaluator
makes a commitment to deliver some goods failure to deliver the goods, or to deliver superior
goods, will be an embarrassment at least, if not a serious threat to his academic status or career
prospects. (Gomm 1981, p. 127)

This threat can be particularly acute if there are multiple audiences for the report, and it is not
impossible for evaluators to be tempted to minimise it by muting critical content in the report.
Stake, in the United States, describes the position to make it more than just a hypothetical one:

It is recognized, particularly by Mike Scriven and Ernie House, that cooption is a problem, that
the rewards to an evaluator for producing a favorable evaluation report often greatly outweigh
the rewards for producing an unfavourable report. I do not know of any evaluators who falsify
their reports, but I do know many consciously or unconsciously choose to emphasize the
objectives of the program staff and to concentrate on the issues and variables most likely to
show where the program is successful. I often do this myself (Stake 1980, p. 74)

A form of reporting that entails judgments and possibly recommendations (a common but not a
universal element of reports) therefore raises particular issues of this kind. Cooption is a
danger of case study, illuminative or similar forms of evaluation, since the evaluator by
definition in these cases works closely with the team and may feel tempted or even obliged, as
Stake suggests, to highlight their view of the process and outcomes, and given the trust that has
been involved, to highlight the positive ones. The danger is not inevitable, and can be overcome
by adherence to the initial principles and strategies agreed for the project. This makes the initial
agreement, and the forms of consent of the parties concerned, all the more important.
Agreement at the outset needs to be clear about the process, the outcomes, the audiences and
the nature and purpose of the report.

Examples

Given normal undertakings of confidentiality, the literature of evaluation contains few examples
of actual reports. Those that are in the public domain are normally those submitted on major
national or international initiatives, may be very substantial, and in some cases are on internet
or intranet websites. A 100-200 page final report, probably highly statistical, on a multimillion
or $ project in agriculture or literacy will not help to illustrate the issues discussed here at more
modest levels. However:-

An anonymised report 'On-line learning (OLL): an evaluation' can be viewed by clicking


here (to download a 55K Word document). This gives some idea of what a report on a
substantial national project (covering seven universities) might contain, and indicates
some of the evaluation methodology on which it was based.
Click here for an article by Ian Jamieson (1983) on The role of evaluation in action-
research projects: the case of the Schools Council Industry Project, Cambridge Journal of
Education, vol. 13, no. 2, pp 37-45. This describes the evaluation process in a large,
four-year action-research project on teaching about industrial society in the school
curriculum. This is not a report, but it gives a clear idea of the relationship between the
evaluators purposes and the work of the project team.

Some good advice from Rob Phillips and Tony Gilding on 'Approaches to evaluating the
effect of ICT on student learning' (a 333K Word document in rtf format) is available by
clicking here.

7 Advice

It may help finally to summarise some advice to have in mind when undertaking an evaluation:

1. Ensure at the outset that you have a full discussion of what you are going to be doing,

resulting in an agreed written statement. This may cover time scales, finance, reporting

(frequency, to whom, ownership of reports).

2. Be sure whether it is a process or impact, formative or summative, evaluation

though this is not necessarily the language of what is agreed.

3. Be clear about the intended methodology (observation, interviews, questionnaires,

focus groups, diaries) and the relationship with the project team, other participants

and project management (senior staff, steering committee).

4. Be sure about confidentiality (eg if formatively reporting to the project team, what

information it is legitimate, or not, to reveal; whether interviewees will be identified or

indentifiable in reports). The project team and others involved need to understand the

confidentiality position, and it may be advisable to explain this and other matters in

writing for everyone concerned (commonly referred to as an ethics protocol).


5. If there is also to be an internal evaluation, consider what help you can give on its

purpose and methodology.

6. When submitting reports (interim, final) will they go first as drafts (to whom?) to be

checked for accuracy not to challenge or confirm your judgments (it is your report)?

7. Consider, throughout the evaluation process, your own and shared purposes, the

effectiveness of your methodology, the appropriateness of your relationships.

8. Take account of what literature may be helpful.

Given the hypothetical evaluation or evaluations that you

have considered, including some of the problems or difficulties, and given

your own position,if invited to conduct such an evaluation

would you do it?

If so, why? If not, why not?

My own final reflection is that I hope you would!

8 References and further reading (including websites)

Some of the items below are accessible on the Internet as indicated. There are books that it would
be worth reading, but where possible chapters or papers in books are suggested. Items in bold are
the most recommended.

Action Evaluation Research Institute (2000?), Helping groups define, promote and assess
success, http://www.aepro.org/ [including overview, methodology, recent essays and
conceptual frameworks].
Adelman, Clem and Alexander, Robin J. (1982), The Self-Evaluating Institution: practice and
principles in the management of educational change, Methuen, London.

Albee, Alana (1999) Assessing impact: some current and key issues, Caledonia Centre for Social
Development, http://www.caledonia.org.uk/pia.htm [a very useful paper].

Bloom, Benjamin S. (1969) Some theoretical issues relating to educational evaluation, in Tyler,
Ralph W. (ed.), Educational Evaluation: new roles, new means, National Society for the Study of
Education, Chicago [perceptive study of objectives, specifications and outcomes].

Bloom, Benjamin S. (1978), Changes in evaluation methods, in Glaser, Robert (ed.), Research
and Development and School Change, Lawrence Erlbaum, New York [useful insights into early
assumptions about evaluation].

Burgess, Robert G. (ed.), Educational Research and Evaluation: for policy and practice?, Falmer
Press, London [chs include local and national evaluation, and relationship (if any) of evaluation
to policy].

Department of Employment, Further and Higher Education Branch (1994), Evaluating


Development in Higher Education (duplicated).

Ellington, Henry and Ross, Gavin (1994), Evaluating teaching quality throughout a university: a
practical scheme based on self-assessment, Quality Assurance in Education, vol. 2, no. 2, pp. 4-9
plus annexes.

Gomm, Roger (1981), Salvage evaluation, in Smetherham, David (ed.), Practising Evaluation,
Nafferton Books, Driffield.

Hamilton, David et al. (1977), Beyond the Numbers Game: a reader in educational evaluation,
Macmillan, Basingstoke [Invaluable, including key writers, MacDonald and Walker on case
study, and the influential Parlett and Hamilton study of Evaluation as illumination. Can be read
selectively.]

Hemphill, John K. (1969) The relationship between research and evaluation studies, in Tyler,
Ralph W. (ed.) Educational Evaluation: new roles, new means, National Society for the Study of
Education, Chicago [useful discussion of the relationship].

Hopkins, David (1989), Evaluation for School Development, Open University Press, Milton
Keynes [First 2 chapters are a good introduction to types of evaluation and an argument for
evaluation in the service of development].

Jamieson, Ian (1983), The role of evaluation in action-research projects: the case of the Schools
Council Industry Project, Cambridge Journal of Education, vol. 13, no. 2. pp 37-45 [brief account,
raising many of the issues discussed here].
Jamieson, Ian (1984), Evaluation: a case of research in chains?, in Adelman, Clem (ed.), The
Politics and Ethics of Evaluation, Croom Helm, London [the publishers failed to have this book
proof read, so read this chapter with care!].

Kogan, Maurice (1986) Education Accountability: an analytic overview, Hutchinson, London


[particularly ch. 6, Epistemologies and evaluation].

Lawton, Denis (1978) Curriculum evaluation: new approaches, in Denis Lawton et al., Theory
and Practice of Curriculum Studies [short, but covers most of the issues raised here].

MacDonald, Barry and Walker, Rob: see Hamilton et al. above.

Manchester University Department of Applied Social Science (n.d.) Evaluating policy and
practice, [brief account of postgraduate course approach, objectives, course content].

Nebraska Council to Prevent Alcohol and Drug Abuse (2000), The Least You Need to Know
About http://www.nde.state.ne.us/SDFS/ATOD/evaluation.html[types of evaluation].

Northwest Regional Educational Laboratory (2000), Evaluating Whole-School Reform Efforts: a


guide for district and school staff, http://www.nwrac.org/whole-school/index.html [including
good sections on impact evaluation].

ONeil, Mike and Pennington, Gus (1992), Evaluating Teaching and Courses from an Active
Learning Perspective, CVCP Universities Staff Development and Training Unit, Sheffield [mainly
on evaluating teaching in higher education, especially methods of collecting evidence].

Parlett, Malcolm and Hamilton, David: see Hamilton et al. above.

Parsons, Carl (1981) A policy for educational evaluation, in Lacey, Colin and Lawton, Denis,
Issues in Evaluation and Accountability, Methuen, London.

Simons, Helen (1981), Process evaluation in schools, in Lacey, Colin and Lawton, Denis, Issues in
Evaluation and Accountability, Methuen, London.

Simons, Helen (1987), Getting to Know Schools in a Democracy: the politics and process of
evaluation, Lewes, Falmer Press.

Stake, Robert E. (1980), Program evaluation, particularly responsive evaluation, in Dockrell,


W.B. and Hamilton, David (eds), Rethinking Educational Research, Hodder and Stoughton,
London.

Stenhouse, Lawrence (1975) An Introduction to Curriculum Research and Development,


Heinemann, London [Ch. 8 on The evaluation of curriculum is a key text].

Trochim, William M.K. (2002), Introduction to Evaluation,


http://www.socialresearchmethods.net/kb/intreval.htm [definitions, strategies, types,
questions and methods; link to The Planning-Evaluation Cycle and An Evaluation Culture. Based
on book, Research Methods Knowledge Base].

Wayne State University Center for Urban Studies (n.d.) account of approach to evaluation
research, go to http://www.cus.wayne.edu/capabilities/intro.asp and click on 'evaluation' in
bullet point list.

Weiss, C.H. (1998, 2nd edn), Evaluation: methods for studying programs and policies, Prentice
Hall, New York [massive compendium, suitable for consulting; very expensive].

World Bank Group (2001) Poverty Net, http://worldbank.org/poverty/impact/ [substantial


account of an approach to large-scale project evaluation, including understanding impact
evaluation, methods and techniques, many examples and readings; valuable insights].

You might also like