Professional Documents
Culture Documents
Background Methods chosen for assessment and the review of assessments by external examiners and the
manner in which they are applied are so intimately as- fourth level involves evaluation over several assess-
sociated with how individuals learn that developing ments.
appropriate assessment strategies is a key part of ef- Relating assessment to the curriculum This long-term
fective curriculum development. evaluation should examine whether existing assess-
The assessment cycle We describe a four-stage assess- ments are congruent with the curriculum and relate to
ment cycle identifying important steps in assessment. all facets of the students' learning experiences. This is
Each step is described in detail, stressing its key aspects, particularly important in a curriculum where the
including: the need for clear assessment policy and learning outcomes of student-centred learning are em-
strategy, the importance of assessment blueprints in phasized. Changes in the assessment of postgraduate
planning assessment, the need for effective feedback trainees and increasing emphasis on peer review of
when presenting results, and the essential, but often clinicians will raise the pro®le of these outcomes in
overlooked, need to evaluate the assessment process undergraduate education.
itself.
Evaluating assessment This ®nal evaluation stage is the Keywords Undergraduate medical education,
most important part of the assessment cycle and can be *organization, *educational assessment curriculum,
divided into four levels. The ®rst level includes evalu- programme evaluation, feedback, Great Britain.
ating each question in the assessment, the second level
Medical Education 1999;33:276±281
is concerned with establishing validity and reliability,
the third level centres on the assessment process and
1 Assessment should be based on an understanding of how students learn. Assessment should play a positive role in the
learning experiences of students.
2 Assessment should accommodate individual differences in students. A diverse range of assessment instruments and
processes should be employed, so as not to disadvantage any particular individual or group of learners. Assessment
processes and instruments should accommodate and encourage creativity and originality shown by students.
3 The purposes of assessment need to be clearly explained. Staff, students and the outside world need to be able to see
why assessment is being used and the rationale for choosing each individual form of assessment in its particular context.
4 Assessment needs to be valid. By this, we mean that assessment methods should be chosen which directly measure that
which it is intended to measure, and not just a re¯ection in a different medium of the knowledge, skills or competences
being assessed.
5 Assessment instruments and processes need to be reliable and consistent. As far as is possible, subjectivity should be
eliminated, and assessment should be carried out in ways where the grades or scores that students are awarded are
independent of the assessor who happens to mark their work. External examiners and moderators should be active
contributors to assessment, rather than observers.
6 All assessment forms should allow students to receive feedback on their learning and their performance. Assessment
should be a developmental activity. There should be no hidden agendas in assessment, and we should be prepared to
justify to students the grades or scores we award them, and help students to work out how to improve. Even when
summative forms of assessment are employed, students should be provided with feedback on their performance, and in-
formation to help them identify where their strengths and weaknesses are.
7 Assessment should provide staff and students with opportunities to re¯ect on their practice and their learning.
Assessment instruments and processes should be the subject of continuous evaluation and adjustment. Monitoring
and adjustment of the quality of assessment should be built in to quality control processes in universities and
professional bodies.
8 Assessment should be an integral component of course design, and not something bolted on afterwards. Teaching
and learning elements of each course should be designed in the full knowledge of the sorts of assessment students will
encounter, and be designed to help them show the outcomes of their learning under favourable conditions
9 The amount of assessment should be appropriate. Students' learning should not be impeded by being driven by an
overload of assessment requirements, nor should the quality of the teaching conducted by staff be impaired by excessive
burdens of assessment tasks.
10 Assessment criteria need to be understandable, explicit and public. Students need to be able to tell what is expected
of them in each form of assessment they encounter. Assessment criteria also need to be understandable to employers,
and others in the outside world
must be made explicit, with time taken to consider why assessments.5 Blueprints vary in complexity, from
each individual assessment is proposed. Is the assess- an indication of the number of questions taken from
ment formative or summative? Which aspects of the each subject area, module or course, to a more complex
course are to be assessed? Which learning objectives are matrix relating content to behavioural objectives. At a
relevant to the assessment? Are you principally inter- more detailed level, it is also possible to indicate
ested in the course as de®ned by the written curricu- weightings and question types. Examples of different
lum, or by other, wider, learning activities? How will blueprints are given by Newble et al. and O'Donnell
standards be set? Will pass/fail decisions be based on a et al.6,7 Table 2 shows an outline matrix, based on an
norm-referenced system where the pass mark is ad- example given by Harden and Gleeson for a clinical
justed to allow a set number of candidates to pass, or medical examination8 in which the content area is re-
will some method of criterion-referencing, in which the lated to speci®c behavioural objectives.
pass mark is based on predetermined standards, be We recommend the use of a diverse range of assess-
employed?2±4 ment methods. All individual methods disadvantage
some students, and, in addition, using a selection of
methods allows `triangulation' with evidence relating to
Use of an assessment blueprint
particular aspects of performance arising from different
The selection of assessment methods must follow and sources. The planning phase also includes deciding the
not precede the resolution of these issues. The whole nature and timing of assessments, paying attention to
process should start with devising a `blueprint', test- separating formative and summative assessments, al-
matrix or table of speci®cations for each assessment, in though the same methods may be used for the two
which the objectives to be assessed are matched to the purposes. Importantly the planning phase also includes
Table 2 Assessment blueprint for a clinical medical examination relating subject area to be tested to behavioural objectives (adapted from
Harden and Gleeson8). Examples of possible assessments are identi®ed in the ®rst row
Behavioural objectives
Cardiovascular Take history from Examine a patient Interpret chest X-ray Give advice to a patient
system a patient with angina with mitral stenosis showing left following a myocardial
ventricular hypertrophy infarction
Respiratory
system
Central nervous
system
Genito-urinary
system
designing appropriate staff training in devising, imple- in¯uence on the overall result. Harmonizing the plan-
menting and evaluating all aspects of the assessment ned and the achieved weights of the components is a
programme. technical task, but it is important that the decision
makers understand the need for it, and that the candi-
dates know how scores are to be adjusted and com-
Step 2. Developing and implementing
bined, and what pass fail rules will be adopted.10 This
assessment
has a profound effect on how candidates will prepare
Once the assessment policy and strategy are agreed, the for the assessment and how they will view the impor-
next phase of the cycle follows. This involves preparing tance of learning the material it addresses.
individual assessments, selecting appropriate assess-
ment methods from those identi®ed in the planning
Step 3. Presenting the results
phase and writing items for the assessment. It is im-
portant to refer to the assessment blueprint to ensure During this part of the cycle the assessments are
coverage of the curriculum throughout this stage in marked and the results presented. Bias is avoided by
order to ensure content validity. This phase should be anonymous and double marking of written responses
strengthened by staff-development to familiarize faculty (where feasible). Where there are too many candidates
with the proposed question formats. `Item writing' for one individual to mark all their responses, one ex-
parties are a useful strategy for obtaining questions and aminer should mark the responses to one item by all
informing the teachers about test construction.9 The candidates (rather than all the items by a few candi-
preparation of questions includes the development of dates). This reduces the problems that can arise when
model answers for written responses, and tick lists or examiners differ in the rigour with which they apply the
rating forms for observed assessments, together with marking scheme.
the proposed marking schemes. Decisions about how For assessments such as OSCEs where the number of
the standard will be set should be taken during this examiners is a limiting resource, higher reliability is
stage. If absolute standards are proposed, criteria should achieved by increasing the number of stations with each
be developed and documented by a group of subject station marked by a single examiner, rather than having
experts, with explicit statements de®ning the perfor- fewer stations marked by multiple observers.11
mance expected of candidates at different levels of
achievement. In high stakes assessments particular at-
Providing feedback to students
tention should be paid to precision at the pass/fail level.
If the results of assessments are to be combined, or Presentation of results also requires providing effective
rules for compensation are proposed, care must be ta- feedback to students. This is an integral part of for-
ken to avoid one component having unintended undue mative assessments and, arguably, similarly important
for summative assessments. Students often receive little cation skills in year 1, but subsequently fail the com-
feedback from summative assessments other than an munications skills element of an assessment of clinical
indication of the grade or mark obtained. Meaningful competence, this might indicate problems with the va-
feedback enables students to identify their strengths lidity of either assessment (or an unfortunate effect of
and weaknesses and helps them improve their subse- the course!)
quent performance.12,13 Feedback can take many forms Reliability refers to the extent to which the assessment
and can be written or oral, and given on an individual provides a consistent measure of whatever it is mea-
or group basis. Guidelines for ef®cient feedback include suring. Tests of competence which generate numerical
using standardized feedback forms and computer banks scores should report a measure of internal consistency
of commonly used statements. Effective feedback (Cronbach alpha) with a level of 0á8 or greater. But the
should be immediate or given as soon after the assess- most important source of error in¯uencing the reliability
ment as is feasible.14 Studies have shown that students of a test stems from the consistently documented ®nding
value immediate feedback from OSCEs.15 of `content speci®city'. The performance of a candidate
on one test item is poorly correlated with performance
on another, and is speci®c to content.16 To achieve
Step 4. Evaluating the assessment
generalizable or stable scores many test elements are
The evaluation of assessment is frequently left out of required, leading to the need for longer testing time and
the assessment cycle in the general hurly burly of the the search for ef®cient test methods.
academic year. To our minds, this is perhaps the most
important stage, driving the assessment cycle and
Third level evaluation
prompting further developments. Evaluation can also
feed into curriculum planning by highlighting the The external examiner has a crucial role as part of the
strengths and weaknesses of the curriculum. third level of the evaluation step. He or she should be
involved in evaluation of both the assessment process
and the review of assessments. This involves inspection
First level evaluation
of the assessments and model answers before the as-
The evaluation step can be divided into 4 levels. The sessment has taken place, and review of a selection of
®rst level relates to the individual questions or items marked papers. A thorough critique of questions and
that make up the assessment. Computer packages for expected responses, prior to the assessment, should
marking multiple choice questions often provide in- help to identify any problems around the validity of the
built item analysis, calculating the dif®culty index assessment and provide feedback on whether the as-
(percentage of respondents who gave the correct re- sessment is at an appropriate level. External examiners
sponse) and discrimination index (a comparison of the should also attend non-written assessments
numbers of high scoring and low scoring students who (e.g. OSCEs) to observe proceedings. Obviously, it is
gave the correct response) for each question. Inspection unlikely that it will be feasible or desirable for all ex-
of the responses and scores in both written tests and ternal examiners to attend all such assessments; ideally
OSCEs gives an indication of the quality of the candi- an external examiner should be present at a sample of
dates' and the model answers. Were the students able the assessments to enable him or her to observe the
to answer each question or perform each task? If not, it organization and ensure fairness. The presence of the
is important to consider why. Have students not un- external examiner provides an opportunity to evaluate
derstood the question? Was the question or OSCE and comment on the face validity of the assessments.
station poorly written or ambiguous? Does the question The evaluation phase offers an opportunity to review
match the learning objectives? the pass/fail standards set for the assessment. This can
occur both at the level of the individual items that
comprise the assessment, and at the level of the as-
Second level evaluation
sessment as a whole. In each case, this is a useful point
Second level evaluation involves establishing validity at which to consider whether prespeci®ed standards are
and calculating reliability of the assessment. at an appropriate level.
If the assessment has been developed against a
blueprint, content validity (does the assessment mea-
Fourth level evaluation
sure what it is intended to measure?) is built in. How-
ever, it is important to consider predictive validity. For Levels 1±3 include the more immediate aspects of
example, if students pass an assessment of communi- evaluation. Level 4 involves taking a `long-term' view of