You are on page 1of 6

Assessment

Evaluating assessment: the missing link?

S L Fowell,1 L J Southgate2 and J G Bligh1

Background Methods chosen for assessment and the review of assessments by external examiners and the
manner in which they are applied are so intimately as- fourth level involves evaluation over several assess-
sociated with how individuals learn that developing ments.
appropriate assessment strategies is a key part of ef- Relating assessment to the curriculum This long-term
fective curriculum development. evaluation should examine whether existing assess-
The assessment cycle We describe a four-stage assess- ments are congruent with the curriculum and relate to
ment cycle identifying important steps in assessment. all facets of the students' learning experiences. This is
Each step is described in detail, stressing its key aspects, particularly important in a curriculum where the
including: the need for clear assessment policy and learning outcomes of student-centred learning are em-
strategy, the importance of assessment blueprints in phasized. Changes in the assessment of postgraduate
planning assessment, the need for effective feedback trainees and increasing emphasis on peer review of
when presenting results, and the essential, but often clinicians will raise the pro®le of these outcomes in
overlooked, need to evaluate the assessment process undergraduate education.
itself.
Evaluating assessment This ®nal evaluation stage is the Keywords Undergraduate medical education,
most important part of the assessment cycle and can be *organization, *educational assessment curriculum,
divided into four levels. The ®rst level includes evalu- programme evaluation, feedback, Great Britain.
ating each question in the assessment, the second level
Medical Education 1999;33:276±281
is concerned with establishing validity and reliability,
the third level centres on the assessment process and

3. presenting the results;


Introduction
4. evaluating the assessment.
Most medical educators involved in curriculum plan-
Steps 1 and 4 pose major challenges to medical edu-
ning and development recognize the interplay between
cators and curriculum planners. In practice, step 4 is
assessment and learning, and that to a large extent as-
often omitted, although it is this evaluation phase that
sessment drives learning. Developing an appropriate
in¯uences the quality and is crucial to the development of
assessment strategy is a key part of effective and sus-
an effective assessment. Increasingly, it will be important
tainable curriculum development. Devising suitable
in withstanding legal challenge as high stakes assess-
assessment systems is not a one-off process. Four dis-
ments in¯uence career progression at all stages of med-
tinct steps make up the assessment cycle:
ical education and training. Brown and her colleagues
1. planning and preparation; have described a 10-point manifesto based on core ed-
2. developing and carrying out assessment; ucational principles underlying assessment.1 These are
shown in Table 1 and underpin our argument.
1
Department of Health Care Education, The University of Liverpool,
3rd Floor, University Clinical Department, Duncan Building, Liver-
pool L69 3GA, UK Step 1. Planning and preparation
2
University College London Medical School, 4th Floor, Archway
Wing, Whittington Hospital Campus, Highgate Hill, London W19 The most important aspects of this stage of the as-
SNF, UK sessment cycle are devising an assessment policy and
Correspondence: S L Fowell, Department of Health Care Education,
The University of Liverpool, 3rd Floor, University Clinical Depart- developing a strategy for assessment. First, the overall
ment, Duncan Building, Liverpool. L69 3GA, UK purposes of assessment and its role in the curriculum

276 Ó Blackwell Science Ltd M E D I C AL ED U C AT I ON 1999;33:276±281


Evaluating assessment: the missing link? · S L Fowell et al. 277

Table 1 An `Assessment manifesto' taken from Brown et al.1

1 Assessment should be based on an understanding of how students learn. Assessment should play a positive role in the
learning experiences of students.
2 Assessment should accommodate individual differences in students. A diverse range of assessment instruments and
processes should be employed, so as not to disadvantage any particular individual or group of learners. Assessment
processes and instruments should accommodate and encourage creativity and originality shown by students.
3 The purposes of assessment need to be clearly explained. Staff, students and the outside world need to be able to see
why assessment is being used and the rationale for choosing each individual form of assessment in its particular context.
4 Assessment needs to be valid. By this, we mean that assessment methods should be chosen which directly measure that
which it is intended to measure, and not just a re¯ection in a different medium of the knowledge, skills or competences
being assessed.
5 Assessment instruments and processes need to be reliable and consistent. As far as is possible, subjectivity should be
eliminated, and assessment should be carried out in ways where the grades or scores that students are awarded are
independent of the assessor who happens to mark their work. External examiners and moderators should be active
contributors to assessment, rather than observers.
6 All assessment forms should allow students to receive feedback on their learning and their performance. Assessment
should be a developmental activity. There should be no hidden agendas in assessment, and we should be prepared to
justify to students the grades or scores we award them, and help students to work out how to improve. Even when
summative forms of assessment are employed, students should be provided with feedback on their performance, and in-
formation to help them identify where their strengths and weaknesses are.
7 Assessment should provide staff and students with opportunities to re¯ect on their practice and their learning.
Assessment instruments and processes should be the subject of continuous evaluation and adjustment. Monitoring
and adjustment of the quality of assessment should be built in to quality control processes in universities and
professional bodies.
8 Assessment should be an integral component of course design, and not something bolted on afterwards. Teaching
and learning elements of each course should be designed in the full knowledge of the sorts of assessment students will
encounter, and be designed to help them show the outcomes of their learning under favourable conditions
9 The amount of assessment should be appropriate. Students' learning should not be impeded by being driven by an
overload of assessment requirements, nor should the quality of the teaching conducted by staff be impaired by excessive
burdens of assessment tasks.
10 Assessment criteria need to be understandable, explicit and public. Students need to be able to tell what is expected
of them in each form of assessment they encounter. Assessment criteria also need to be understandable to employers,
and others in the outside world

must be made explicit, with time taken to consider why assessments.5 Blueprints vary in complexity, from
each individual assessment is proposed. Is the assess- an indication of the number of questions taken from
ment formative or summative? Which aspects of the each subject area, module or course, to a more complex
course are to be assessed? Which learning objectives are matrix relating content to behavioural objectives. At a
relevant to the assessment? Are you principally inter- more detailed level, it is also possible to indicate
ested in the course as de®ned by the written curricu- weightings and question types. Examples of different
lum, or by other, wider, learning activities? How will blueprints are given by Newble et al. and O'Donnell
standards be set? Will pass/fail decisions be based on a et al.6,7 Table 2 shows an outline matrix, based on an
norm-referenced system where the pass mark is ad- example given by Harden and Gleeson for a clinical
justed to allow a set number of candidates to pass, or medical examination8 in which the content area is re-
will some method of criterion-referencing, in which the lated to speci®c behavioural objectives.
pass mark is based on predetermined standards, be We recommend the use of a diverse range of assess-
employed?2±4 ment methods. All individual methods disadvantage
some students, and, in addition, using a selection of
methods allows `triangulation' with evidence relating to
Use of an assessment blueprint
particular aspects of performance arising from different
The selection of assessment methods must follow and sources. The planning phase also includes deciding the
not precede the resolution of these issues. The whole nature and timing of assessments, paying attention to
process should start with devising a `blueprint', test- separating formative and summative assessments, al-
matrix or table of speci®cations for each assessment, in though the same methods may be used for the two
which the objectives to be assessed are matched to the purposes. Importantly the planning phase also includes

Ó Blackwell Science Ltd ME D I C AL ED U C AT I ON 1999;33:276±281


278 Evaluating assessment: the missing link? · S L Fowell et al.

Table 2 Assessment blueprint for a clinical medical examination relating subject area to be tested to behavioural objectives (adapted from
Harden and Gleeson8). Examples of possible assessments are identi®ed in the ®rst row

Behavioural objectives

Conduct a Interpret information Give information to


Subject area Take a history physical examination and data patients

Weighting 30% 20% 20% 20%

Cardiovascular Take history from Examine a patient Interpret chest X-ray Give advice to a patient
system a patient with angina with mitral stenosis showing left following a myocardial
ventricular hypertrophy infarction
Respiratory
system
Central nervous
system
Genito-urinary
system

designing appropriate staff training in devising, imple- in¯uence on the overall result. Harmonizing the plan-
menting and evaluating all aspects of the assessment ned and the achieved weights of the components is a
programme. technical task, but it is important that the decision
makers understand the need for it, and that the candi-
dates know how scores are to be adjusted and com-
Step 2. Developing and implementing
bined, and what pass fail rules will be adopted.10 This
assessment
has a profound effect on how candidates will prepare
Once the assessment policy and strategy are agreed, the for the assessment and how they will view the impor-
next phase of the cycle follows. This involves preparing tance of learning the material it addresses.
individual assessments, selecting appropriate assess-
ment methods from those identi®ed in the planning
Step 3. Presenting the results
phase and writing items for the assessment. It is im-
portant to refer to the assessment blueprint to ensure During this part of the cycle the assessments are
coverage of the curriculum throughout this stage in marked and the results presented. Bias is avoided by
order to ensure content validity. This phase should be anonymous and double marking of written responses
strengthened by staff-development to familiarize faculty (where feasible). Where there are too many candidates
with the proposed question formats. `Item writing' for one individual to mark all their responses, one ex-
parties are a useful strategy for obtaining questions and aminer should mark the responses to one item by all
informing the teachers about test construction.9 The candidates (rather than all the items by a few candi-
preparation of questions includes the development of dates). This reduces the problems that can arise when
model answers for written responses, and tick lists or examiners differ in the rigour with which they apply the
rating forms for observed assessments, together with marking scheme.
the proposed marking schemes. Decisions about how For assessments such as OSCEs where the number of
the standard will be set should be taken during this examiners is a limiting resource, higher reliability is
stage. If absolute standards are proposed, criteria should achieved by increasing the number of stations with each
be developed and documented by a group of subject station marked by a single examiner, rather than having
experts, with explicit statements de®ning the perfor- fewer stations marked by multiple observers.11
mance expected of candidates at different levels of
achievement. In high stakes assessments particular at-
Providing feedback to students
tention should be paid to precision at the pass/fail level.
If the results of assessments are to be combined, or Presentation of results also requires providing effective
rules for compensation are proposed, care must be ta- feedback to students. This is an integral part of for-
ken to avoid one component having unintended undue mative assessments and, arguably, similarly important

Ó Blackwell Science Ltd M E D I C A L ED U C A T I ON 1999;33:276±281


Evaluating assessment: the missing link? · S L Fowell et al. 279

for summative assessments. Students often receive little cation skills in year 1, but subsequently fail the com-
feedback from summative assessments other than an munications skills element of an assessment of clinical
indication of the grade or mark obtained. Meaningful competence, this might indicate problems with the va-
feedback enables students to identify their strengths lidity of either assessment (or an unfortunate effect of
and weaknesses and helps them improve their subse- the course!)
quent performance.12,13 Feedback can take many forms Reliability refers to the extent to which the assessment
and can be written or oral, and given on an individual provides a consistent measure of whatever it is mea-
or group basis. Guidelines for ef®cient feedback include suring. Tests of competence which generate numerical
using standardized feedback forms and computer banks scores should report a measure of internal consistency
of commonly used statements. Effective feedback (Cronbach alpha) with a level of 0á8 or greater. But the
should be immediate or given as soon after the assess- most important source of error in¯uencing the reliability
ment as is feasible.14 Studies have shown that students of a test stems from the consistently documented ®nding
value immediate feedback from OSCEs.15 of `content speci®city'. The performance of a candidate
on one test item is poorly correlated with performance
on another, and is speci®c to content.16 To achieve
Step 4. Evaluating the assessment
generalizable or stable scores many test elements are
The evaluation of assessment is frequently left out of required, leading to the need for longer testing time and
the assessment cycle in the general hurly burly of the the search for ef®cient test methods.
academic year. To our minds, this is perhaps the most
important stage, driving the assessment cycle and
Third level evaluation
prompting further developments. Evaluation can also
feed into curriculum planning by highlighting the The external examiner has a crucial role as part of the
strengths and weaknesses of the curriculum. third level of the evaluation step. He or she should be
involved in evaluation of both the assessment process
and the review of assessments. This involves inspection
First level evaluation
of the assessments and model answers before the as-
The evaluation step can be divided into 4 levels. The sessment has taken place, and review of a selection of
®rst level relates to the individual questions or items marked papers. A thorough critique of questions and
that make up the assessment. Computer packages for expected responses, prior to the assessment, should
marking multiple choice questions often provide in- help to identify any problems around the validity of the
built item analysis, calculating the dif®culty index assessment and provide feedback on whether the as-
(percentage of respondents who gave the correct re- sessment is at an appropriate level. External examiners
sponse) and discrimination index (a comparison of the should also attend non-written assessments
numbers of high scoring and low scoring students who (e.g. OSCEs) to observe proceedings. Obviously, it is
gave the correct response) for each question. Inspection unlikely that it will be feasible or desirable for all ex-
of the responses and scores in both written tests and ternal examiners to attend all such assessments; ideally
OSCEs gives an indication of the quality of the candi- an external examiner should be present at a sample of
dates' and the model answers. Were the students able the assessments to enable him or her to observe the
to answer each question or perform each task? If not, it organization and ensure fairness. The presence of the
is important to consider why. Have students not un- external examiner provides an opportunity to evaluate
derstood the question? Was the question or OSCE and comment on the face validity of the assessments.
station poorly written or ambiguous? Does the question The evaluation phase offers an opportunity to review
match the learning objectives? the pass/fail standards set for the assessment. This can
occur both at the level of the individual items that
comprise the assessment, and at the level of the as-
Second level evaluation
sessment as a whole. In each case, this is a useful point
Second level evaluation involves establishing validity at which to consider whether prespeci®ed standards are
and calculating reliability of the assessment. at an appropriate level.
If the assessment has been developed against a
blueprint, content validity (does the assessment mea-
Fourth level evaluation
sure what it is intended to measure?) is built in. How-
ever, it is important to consider predictive validity. For Levels 1±3 include the more immediate aspects of
example, if students pass an assessment of communi- evaluation. Level 4 involves taking a `long-term' view of

Ó Blackwell Science Ltd ME D I C AL ED U C AT I ON 1999;33:276±281


280 Evaluating assessment: the missing link? · S L Fowell et al.

Figure 2. (a) The relationship between the taught, learned and


planned curriculum.
Figure 1. The assessment cycle.

assessment and maintaining a weather eye for trends. If


it becomes apparent that several cohorts of students are
performing badly in one objective, skill or subject area
this may point to a de®ciency in the curriculum. Once
identi®ed, this may lead to development of new as-
sessments, thus starting another round of the assess-
ment cycle.

Assessing the parts we don't normally assess.


Is assessment congruent with all aspects
of the curriculum?
Besides evaluating the technical aspects of assessment
at the levels described above, it is also important to
Figure 2. (b) The relationship between assessment and facets of
consider the wider issue of how assessments relate to the curriculum.
the total learning experience encountered by students.
Inevitably, there is a disparity between the curriculum
`in action' which is delivered to the students, and the importance is placed on the desirable learning out-
`curriculum experience' as received by the students. In comes of student-centred learning which are included
Fig. 2(a), each solid circle represents one facet of the in area (a). These considerations are crucial in
curriculum; the planned or written curriculum, the innovative, student-centred courses; in particular
taught curriculum which is delivered to the students via those involving problem-based learning, where stu-
a variety of mechanisms and the learned curriculum dent identi®ed learning objectives are so important.
which the students experience. Area (a) represents Such important, but unidenti®ed issues may include
learning that occurs outside the planned and taught team working and communication skills, and also
curricula as a result of peer and social in¯uences and social skills and independent learning skills. These
the climate of learning. This includes favourable in¯u- skills are not usually `examined' and may not be
ences, for example, adopting appropriate dress for assessed at any time in the formal curriculum.
clinical contact, developing group work and self study However, changes in the way in which postgraduate
skills and less desirable in¯uences such as adopting trainees are to be assessed and the present emphasis
short-term learning strategies and reinforcing social on peer review of performance of established
stereotypes. clinicians17,18 will ensure that greater attention to the
Assessment of area (b) represents assessment `by assessment of these qualities will soon occur in the
ambush' and should be avoided. A great deal of undergraduate curriculum.

Ó Blackwell Science Ltd M E D I C A L ED U C A T I ON 1999;33:276±281


Evaluating assessment: the missing link? · S L Fowell et al. 281

Conclusion 9 Case SM, Swanson DB. Extended matching items: a practical


alternative to free-response questions. Teach Learn Med
Effective assessment is a continuous cycle of develop- 1993;1:107±15.
ment, implementation, presentation and evaluation. As 10 Foulkes J. Combining components of assessment. In: Newble
part of the evaluation phase of the cycle, medical edu- D, Jolly B, Wakeford R, editors. The Certi®cation and
cators should consider whether the assessment process Recerti®cation of Doctors. Issues in the Assessment of Clinical
is congruent with the students' total learning experi- Competence. Cambridge: Cambridge University Press; 1994.
ence. This presents a major challenge to identify those p.134±50.
learning outcomes which are not assessed and to de- 11 Newble DI, Swanson DB. Psychometric characteristics of the
objective structured clinical examination. Med Educ
velop effective methods for their assessment.
1988;22:325±34.
12 Brown S, Knight P. Purposes of assessment. Assessing Learners
References in Higher Education. London: Kogan Page; 1994. p.30±41.
13 Hodder RV, Rivington RN, Calcutt LE, Hart IR. The effec-
1 Brown S, Race P, Smith B. An assessment manifesto. 500 Tips tiveness of immediate feedback during the OSCE. Med Educ
on Assessment. London: Kogan Page; 1996. p.142±3. 1989;23:184±8.
2 Lowry S, Assessment of Students. BMJ 1993;306:51±4. 14 Brown S, Knight P. Speedier assessment: Ways of giving
3 Turnbull JM. What is normative versus criterion-referenced feedback. Assessing Learners in Higher Education. London:
assessment? Med Teacher 1989;11:145±50. Kogan Page; 1994. p. 112±3.
4 Cusimano MD. Standard setting in medical education. Acad 15 Black NMI, Harden RM. Providing feedback to students on
Med 1996;71:S112±S120. clinical skills by using the Objective Structured Clinical Ex-
5 Newble D, Dauphinee D, Macdonald M, Mulholland H, amination. Med Educ 1986;20:48±52.
Dawson B, Page G, Swanson D, Thomson A, van der 16 van der Vleuten CPM, Norman GR, de Graaff E. Pitfalls in
Vleuten C. Guidelines for assessing clinical competence. the pursuit of objectivity ± issues of reliability. Med Educ
Teach Learn Med 1994;6:213±9. 1991;25:110±8.
6 Newble DI, Hoare J, Elmslie RG. The validity and reliability 17 Irvine D. The performance of doctors: I. Professionalism and
of a new examination of the clinical competence of medical self-regulation in a changing world. BMJ 1997;314:1540±2.
students. Med Educ 1981;15:46±52. 18 Irvine D. The performance of doctors. II. Maintaining good
7 O'Donnell MJ, Obenshain SS, Erdmann JB. Background es- practice, protecting patients from poor performance. BMJ
sential to the proper use of results of Step 1 and Step 2 of the 1997;314:1613±5.
USMLE. Acad Med 1993;68:734±9.
8 Harden RM, Gleeson FA. Assessment of clinical competence
Received 13 January 1999 accepted for publication 19 January 1999
using an objective structured clinical examination (OSCE).
Med Educ 1979;13:41±54.

Ó Blackwell Science Ltd ME D I C AL ED U C AT I ON 1999;33:276±281

You might also like