You are on page 1of 10

The n e w e ng l a n d j o u r na l of m e dic i n e

Review article

Medical Education
Malcolm Cox, M.D., and David M. Irby, Ph.D., Editors

Assessment in Medical Education


Ronald M. Epstein, M.D.

As an attending physician working with a student for a week, you receive a form that From the Departments of Family Medicine,
asks you to evaluate the student’s fund of knowledge, procedural skills, professional- Psychiatry, and Oncology and the Roch-
ester Center to Improve Communication
ism, interest in learning, and “systems-based practice.” You wonder which of these in Health Care, University of Rochester
attributes you can reliably assess and how the data you provide will be used to further School of Medicine and Dentistry, Roch-
the student’s education. You also wonder whether other tests of knowledge and com- ester, NY. Address reprint requests to Dr.
Epstein at 1381 South Ave., Rochester,
petence that students must undergo before they enter practice are equally problematic. NY 14620, or at ronald_epstein@urmc.

I
rochester.edu.
n one way or another, most practicing physicians are involved in
N Engl J Med 2007;356:387-96.
assessing the competence of trainees, peers, and other health professionals. As Copyright © 2007 Massachusetts Medical Society.
the example above suggests, however, they may not be as comfortable using
educational assessment tools as they are using more clinically focused diagnostic
tests. This article provides a conceptual framework for and a brief update on com-
monly used and emerging methods of assessment, discusses the strengths and
limitations of each method, and identifies several challenges in the assessment of
physicians’ professional competence and performance.

C ompe tence a nd Per for m a nce

Elsewhere, Hundert and I have defined competence in medicine as “the habitual and
judicious use of communication, knowledge, technical skills, clinical reasoning,
emotions, values, and reflection in daily practice for the benefit of the individuals
and communities being served.”1 In the United States, the assessment of medical
residents, and increasingly of medical students, is largely based on a model that
was developed by the Accreditation Council for Graduate Medical Education (ACGME).
This model uses six interrelated domains of competence: medical knowledge, patient
care, professionalism, communication and interpersonal skills, practice-based learn-
ing and improvement, and systems-based practice.2
Competence is not an achievement but rather a habit of lifelong learning3; as-
sessment plays an integral role in helping physicians identify and respond to their
own learning needs. Ideally, the assessment of competence (what the student or
physician is able to do) should provide insight into actual performance (what he or
she does habitually when not observed), as well as the capacity to adapt to change,
find and generate new knowledge, and improve overall performance.4
Competence is contextual, reflecting the relationship between a person’s abilities
and the tasks he or she is required to perform in a particular situation in the real
world.5 Common contextual factors include the practice setting, the local prevalence
of disease, the nature of the patient’s presenting symptoms, the patient’s education­
al level, and other demographic characteristics of the patient and of the physician.
Many aspects of competence, such as history taking and clinical reasoning, are
also content-specific and not necessarily generalizable to all situations. A student’s

n engl j med 356;4 www.nejm.org january 25, 2007 387


The New England Journal of Medicine
Downloaded from nejm.org on March 1, 2011. For personal use only. No other uses without permission.
Copyright © 2007 Massachusetts Medical Society. All rights reserved.
The n e w e ng l a n d j o u r na l of m e dic i n e

clinical reasoning may appear to be competent in approaching a relatively unstructured body of


areas in which his or her base of knowledge is knowledge. They can reinforce students’ intrinsic
well organized and accessible6 but may appear to motivation to learn and inspire them to set high­er
be much less competent in unfamiliar territory.7 standards for themselves.18 Although summative
However, some important skills (e.g., the ability assessments are intended to provide professional
to form therapeutic relationships) may be less self-regulation and accountability, they may also
dependent on content.8 act as a barrier to further practice or training.19
Competence is also developmental. Habits of A distinction should be made between assess-
mind and behavior and practical wisdom are ments that are suitable only for formative use
gained through deliberate practice9 and reflection and those that have sufficient psychometric rigor
on experience.10-14 Students begin their training for summative use. This distinction is especially
at a novice level, using abstract, rule-based formu­ important in selecting a method of evaluating
las that are removed from actual practice. At high­ competence for high-stakes assessments (i.e.,
er levels, students apply these rules differentially licensing and certification examinations). Corre-
to specific situations. During residency, trainees spondingly, summative assessments may not pro-
make judgments that reflect a holistic view of a vide sufficient feedback to drive learning.20 How-
situation and eventually take diagnostic shortcuts ever, because students tend to study that which
based on a deeper understanding of underlying they expect to be tested on, summative assess-
principles. Experts are able to make rapid, context- ment may influence learning even in the absence
based judgments in ambiguous real-life situations of feedback.
and have sufficient awareness of their own cogni-
tive processes to articulate and explain how they A s se s smen t Me thods
recognize situations in which deliberation is es-
sential. Development of competence in different All methods of assessment have strengths and
contexts and content areas may proceed at differ- intrinsic flaws (Table 1). The use of multiple ob-
ent rates. Context and developmental level also servations and several different assessment meth-
interact. Although all clinicians may perform at ods over time can partially compensate for flaws
a lower level of competence when they are tired, in any one method.1,21 Van der Vleuten22 describes
distracted, or annoyed, the competence of less five criteria for determining the usefulness of a
experienced clinicians may be particularly suscep- particular method of assessment: reliability (the
tible to the influence of stress.15,16 degree to which the measurement is accurate and
reproducible), validity (whether the assessment
G oa l s of A s se s smen t measures what it claims to measure), impact on
future learning and practice, acceptability to learn­
Over the past decade, medical schools, postgrad- ers and faculty, and costs (to the individual trainee,
uate training programs, and licensing bodies have the institution, and society at large).
made new efforts to provide accurate, reliable, and
timely assessments of the competence of trainees Written Examinations
and practicing physicians.1,2,17 Such assessments Written examination questions are typically classi­
have three main goals: to optimize the capabili- fied according to whether they are open-ended or
ties of all learners and practitioners by providing multiple choice. In addition, questions can be
motivation and direction for future learning, to “context rich” or “context poor.”23 Questions with
protect the public by identifying incompetent phy- rich descriptions of the clinical context invite the
sicians, and to provide a basis for choosing ap- more complex cognitive processes that are char-
plicants for advanced training. acteristic of clinical practice.24 Conversely, context-
Assessment can be formative (guiding future poor questions can test basic factual knowledge
learning, providing reassurance, promoting reflec­ but not its transferability to real clinical problems.
tion, and shaping values) or summative (making Multiple-choice questions are commonly used
an overall judgment about competence, fitness to for assessment because they can provide a large
practice, or qualification for advancement to high­ number of examination items that encompass
er levels of responsibility). Formative assessments many content areas, can be administered in a rel­
provide benchmarks to orient the learner who is atively short period, and can be graded by com-

388 n engl j med 356;4 www.nejm.org january 25, 2007

The New England Journal of Medicine


Downloaded from nejm.org on March 1, 2011. For personal use only. No other uses without permission.
Copyright © 2007 Massachusetts Medical Society. All rights reserved.
Medical Education

puter. These factors make the administration of ing physicians. Although subjectivity can be a
the examination to large numbers of trainees problem in the absence of clearly articulated
straightforward and standardized.25 Formats that standards, a more important issue is that direct
ask the student to choose the best answer from a observation of trainees while they are interacting
list of possible answers are most commonly used. with patients is too infrequent.33
However, newer formats may better assess pro-
cesses of diagnostic reasoning. Key-feature items Direct Observation or Video Review
focus on critical decisions in particular clinical The “long case”34 and the “mini–clinical-evalua-
cases.26 Script-concordance items present a situ- tion exercise” (mini-CEX)35 have been developed
ation (e.g., vaginal discharge in a patient), add a so that learners will be directly observed more fre-
piece of information (dysuria), and ask the exam- quently. In these assessments, a supervising phy-
inee to assess the degree to which this new in- sician observes while a trainee performs a focused
formation increases or decreases the probability history taking and physical examination over a
of a particular outcome (acute salpingitis due to period of 10 to 20 minutes. The trainee then pres-
Chlamydia trachomatis).27 Because the situations por- ents a diagnosis and a treatment plan, and the
trayed are ambiguous, script-concordance items faculty member rates the resident and may pro-
may provide insight into clinical judgment in the vide educational feedback. Structured exercises
real world. Answers to such items have been with actual patients under the observation of the
shown to correlate with the examinee’s level of supervising physician can have the same level of
training and to predict future performance on reliability as structured examinations using stan-
oral examinations of clinical reasoning.28 dardized patients34,36 yet encompass a wider range
Multiple-choice questions that are rich in con- of problems, physical findings, and clinical set-
text are difficult to write, and those who write tings. Direct observation of trainees in clinical
them tend to avoid topics — such as ethical dilem- settings can be coupled with exercises that train-
mas or cultural ambiguities — that cannot be ees perform after their encounters with patients,
asked about easily.29 Multiple-choice questions such as oral case presentations, written exercises
may also create situations in which an examinee that assess clinical reasoning, and literature search-
can answer a question by recognizing the correct es.8,37 In addition, review of videos of encounters
option, but could not have answered it in the with patients offers a powerful means of evaluat-
absence of options.23,30 This effect, called cueing, ing and providing feedback on trainees’ skills in
is especially problematic when diagnostic reason- clinical interactions.8,38
ing is being assessed, because premature closure
— arriving at a decision before the correct diag- Clinical Simulations
nosis has been considered — is a common reason Standardized patients — actors who are trained
for diagnostic errors in clinical practice.31,32 Ex- to portray patients consistently on repeated occa-
tended matching items (several questions, all with sions — are often incorporated into objective
the same long list of possible answers), as well structured clinical examinations (OSCEs), which
as open-ended short-answer questions, can mini- consist of a series of timed “stations,” each one
mize cueing.23 Structured essays also preclude focused on a different task. Since 2004, these ex-
cueing. In addition, they involve more complex aminations have been part of the U.S. Medical
cognitive processes and allow for more contex- Licensing Examination that all senior medical stu-
tualized answers than do multiple-choice ques- dents take.39 The observing faculty member or
tions. When clear grading guidelines are in place, the standardized patient uses either a checklist
structured essays can be psychometrically robust. of specific behaviors or a global rating form to
evaluate the student’s performance.40 The check-
Assessments by Supervising Clinicians list might include items such as “asked if the pa-
Supervising clinicians’ observations and impres- tient smoked” and “checked ankle reflexes.” The
sions of students over a specific period remain the global rating form might ask for a rating of how
most common tool used to evaluate performance well the visit was organized and whether the stu-
with patients. Students and residents most com- dent was appropriately empathetic. A minimum of
monly receive global ratings at the end of a rota- 10 stations, which the student usually visits over
tion, with comments from a variety of supervis- the course of 3 to 4 hours, is necessary to achieve

n engl j med 356;4 www.nejm.org january 25, 2007 389


The New England Journal of Medicine
Downloaded from nejm.org on March 1, 2011. For personal use only. No other uses without permission.
Copyright © 2007 Massachusetts Medical Society. All rights reserved.
390
Table 1. Commonly Used Methods of Assessment.

Method Domain Type of Use Limitations Strengths


Written exercises
Multiple-choice questions in Knowledge, ability to solve Summative assessments within Difficult to write, especially in cer- Can assess many content areas in rela-
­either single-best-answer or problems courses or clerkships; nation- tain content areas; can result in tively little time, have high reliability,
extended matching format al in-service, licensing, and cueing; can seem artificial and can be graded by computer
The

certification examinations removed from real situations


Key-feature and script-concor- Clinical reasoning, problem- National licensing and certifica- Not yet proven to transfer to real- Assess clinical problem-solving ability,
dance questions solving ability, ability to tion examinations life situations that require clini- avoid cueing, can be graded by com-
apply knowledge cal reasoning puter
Short-answer questions Ability to interpret diagnostic Summative and formative as- Reliability dependent on training of Avoid cueing, assess interpretation and
tests, problem-solving sessments in courses and graders problem-solving ability

n engl j med 356;4


ability, clinical reasoning clerkships
skills
Structured essays Synthesis of information, in- Preclinical courses, limited use Time-consuming to grade, must work Avoid cueing, use higher-order cogni-
terpretation of medical in clerkships to establish interrater relia­bility, tive processes
literature long testing time required to en-

www.nejm.org
compass a variety of domains
n e w e ng l a n d j o u r na l

Assessments by supervising clinicians

The New England Journal of Medicine


of

Global ratings with comments Clinical skills, communication, Global summative and some- Often based on second-hand re- Use of multiple independent raters can
at end of rotation teamwork, presentation times formative assess- ports and case presentations overcome some variability due to
skills, organization, work ments in clinical rotations rather than on direct observa- subjectivity
habits tion, subjective

january 25, 2007

Copyright © 2007 Massachusetts Medical Society. All rights reserved.


Structured direct observation Communication skills, clinical Limited use in clerkships and Selective rather than habitual be- Feedback provided by credible experts
m e dic i n e

with checklists for ratings skills residencies, a few board- haviors observed, relatively
(e.g., mini–clinical-evaluation ­certification examinations time-consuming
exercise or video review)
Oral examinations Knowledge, clinical Limited use in clerkships and com­ Subjective, sex and race bias has Feedback provided by credible experts
reasoning prehensive medical school been reported, time-consuming,

Downloaded from nejm.org on March 1, 2011. For personal use only. No other uses without permission.
assessments, some board- require training of examiners,
certification examinations summative assessments need
two or more examiners
Clinical simulations
Standardized patients and ob- Some clinical skills, inter­ Formative and summative as- Timing and setting may seem artifi- Tailored to educational goals; reliable,
jective structured clinical personal behavior, sessments in courses, clerk- cial, require suspension of dis- consistent case presentation and
­examinations ­communication skills ships, medical schools, na- belief, checklists may penalize ratings; can be observed by faculty
tional licensure examinations, examinees who use shortcuts, or standardized patients; realistic
board certification in Canada expensive
Incognito standardized patients Actual practice habits Primarily used in research; Requires prior consent, logistically Very realistic, most accurate way of
some courses, clerkships, challenging, expensive ­assessing clinician’s behavior
and residencies use for for-
mative feedback
High-technology simulations Procedural skills, teamwork, Formative and some summative Timing and setting may seem artifi- Tailored to educational goals, can be
simulated clinical dilem- assessment cial, require suspension of dis- observed by faculty, often realistic
mas belief, checklists may penalize and credible
examinees who use shortcuts,

n engl j med 356;4


expensive
Multisource (“360-degree”) assessments
Peer assessments Professional demeanor, Formative feedback in courses Confidentiality, anonymity, and Ratings encompass habitual behaviors,
work habits, interpersonal and comprehensive medical trainee buy-in essential credible source, correlates with future
behavior, teamwork school assessments, forma- academic and clinical performance

www.nejm.org
tive assessment for board
recertification
Medical Education

Patient assessments Ability to gain patients’ trust; Formative and summative, board Provide global impressions rather Credible source of assessment
patient satisfaction, com- recertification, use by insur- than analysis of specific behav-
munication skills ers to determine bonuses iors, ratings generally high with

The New England Journal of Medicine


little variability

january 25, 2007


Self-assessments Knowledge, skills, attitudes, Formative Do not accurately describe actual Foster reflection and development of
beliefs, behaviors behavior unless training and learning plans
feedback provided
Portfolios All aspects of competence, Formative and summative uses Learner selects best case material, Display projects for review, foster reflec-

Copyright © 2007 Massachusetts Medical Society. All rights reserved.


especially appropriate for across curriculum and with- time-consuming to prepare and tion and development of learning
practice-based learning in clerkships and residency review plans
and improvement and programs, used by some
systems-based practice U.K. medical schools and
specialty boards

Downloaded from nejm.org on March 1, 2011. For personal use only. No other uses without permission.
391
The n e w e ng l a n d j o u r na l of m e dic i n e

a reliability of 0.85 to 0.90.41 Under these condi- as statistical data, when the sources are recog-
tions, structured assessments with the use of nized as credible, when the feedback is framed
standardized patients are as reliable as ratings of constructively, and when the entire process is ac-
directly observed encounters with real patients companied by good mentoring and follow-up.51
and take about the same amount of time.42 Recent studies of peer assessments suggest
Interactions with standardized patients can be that when trainees receive thoughtful ratings and
tailored to meet specific educational goals, and comments by peers in a timely and confidential
the actors who portray the patients can reliably manner, along with support from advisers to help
rate students’ performance with respect to history them reflect on the reports, they find the process
taking and physical examinations. Faculty mem- powerful, insightful, and instructive.51,52 Peer as-
bers who observe encounters with standardized sessments have been shown to be consistent re-
patients can offer additional insights on trainees’ gardless of the way the raters are selected. Such
clinical judgment and the overall coherence of assessments are stable from year to year53 and
the history taking or physical examination. Unan- predict subsequent class rankings as well as sub-
nounced standardized patients, who with the ex- sequent ratings by supervisors.54 Peer assessments
aminees’ prior approval present incognito in ac- depend on trust and require scrupulous attention
tual clinical settings, have been used in health to confidentiality. Otherwise they can be under-
services research to evaluate examinees’ diagnos- mining, destructive, and divisive.
tic reasoning, treatment decisions, and communi- Although patients’ ratings of clinical perfor-
cation skills.43-46 The use of unannounced stan- mance are valuable in principle, they pose several
dardized patients may prove to be particularly problems. As many as 50 patient surveys may be
valuable in the assessment of higher-level trainees necessary to achieve satisfactory reliability.55 Pa-
and physicians in practice. tients who are seriously ill often do not complete
The use of simulation to assess trainees’ clini- surveys; those who do tend to rate physicians less
cal skills in intensive care and surgical settings favorably than do patients who have milder con-
is on the rise.47 Simulations involving sophisticat­ ditions.56 Furthermore, patients are not always
ed mannequins with heart sounds, respirations, able to discriminate among the elements of clin­
oximeter readings, and pulses that respond to a ical practice,57 and their ratings are typically
variety of interventions can be used to assess how high. These limitations make it difficult to use
individuals or teams manage unstable vital signs. patient reports as the only tool for assessing
Surgical simulation centers now routinely use clinical performance. However, ratings by nurses
high-fidelity computer graphics and hands-on can be valuable. Such ratings have been found to
manipulation of surgical instruments to create be reliable with as few as 6 to 10 reports,58 and
a multisensory environment. High-technology they correlate with both patients’ and faculty
simulation is seen increasingly as an important members’ ratings of the interpersonal aspects of
learning aid and may prove to be useful in the trainees’ performance.59
assessment of knowledge, clinical reasoning, and Fundamental cognitive limitations in the abil-
teamwork. ity of humans to know themselves as others see
them restrict the usefulness of self-assessment.
Multisource (“360-Degree”) Assessments Furthermore, rating oneself on prior clinical per-
Assessments by peers, other members of the clin- formance may not achieve another important goal
ical team, and patients can provide insight into of self-assessment: the ability to monitor oneself
trainees’ work habits, capacity for teamwork, and from moment to moment during clinical prac-
interpersonal sensitivity.48-50 Although there are tice.10,60 A physician must possess this ability in
few published data on outcomes of multisource order to meet patients’ changing needs, to recog-
feedback in medical settings, several large pro- nize the limits of his or her own competence,
grams are being developed, including one for all and to manage unexpected situations.
first- and second-year house officers in the United
Kingdom and another for all physicians under- Portfolios
going recertification in internal medicine in the Portfolios include documentation of and reflection
United States. Multisource feedback is most effec- about specific areas of a trainee’s competence.
tive when it includes narrative comments as well This evidence is combined with self-reflection.61

392 n engl j med 356;4 www.nejm.org january 25, 2007

The New England Journal of Medicine


Downloaded from nejm.org on March 1, 2011. For personal use only. No other uses without permission.
Copyright © 2007 Massachusetts Medical Society. All rights reserved.
Medical Education

In medicine, just as in the visual arts, portfolios patients at different points during the rotation,
demonstrate a trainee’s development and techni- a multiple-choice examination with both “key
cal capacity. They can include chart notes, refer- features” and “script-concordance” items to as-
ral letters, procedure logs, videotaped consulta- sess clinical reasoning, an encounter with a stan-
tions, peer assessments, patient surveys, literature dardized patient followed by an oral examination
search­es, quality-improvement projects, and any to assess clinical skills in a standardized setting,
other type of learning material. Portfolios also written essays that would require literature search-
frequently include self-assessments, learning es and synthesis of the medical literature on the
plans, and reflective essays. For portfolios to be basic science or clinical aspects of one or more
maximally effective, close mentoring is required of the diseases the student encountered, and peer
in the assembly and interpretation of the contents; assessments to provide insights into interperson­
considerable time can be expended in this effort. al skills and work habits.
Portfolios are most commonly used in formative The combination of all these results into a
assessments, but their use for summative evalua- portfolio resembles the art of diagnosis; it de-
tions and high-stakes decisions about advance- mands that the student synthesize various bits
ment is increasing.20 and types of information in order to come up
with an overall picture. Although a few medical
Ch a l l enge s in A s se s smen t schools have begun to institute longitudinal as-
sessments that use multiple methods,8 the best
New Domains of Assessment way to deal with the quantity and the qualitative­ly
There are several domains in which assessment is different types of data that the process generates
in its infancy and remains problematic. Quality is not yet clear. New ways of combining qualita-
of care and patient safety depend on effective tive and quantitative data will be required if port-
teamwork,62 and teamwork training is empha- folio assessments are to find widespread applica-
sized as an essential element of several areas of tion and withstand the test of time.
competence specified by the ACGME, yet there is
no validated method of assessing teamwork. Ex- Standardization of Assessment
perts do not agree on how to define professional- Although accrediting organizations specify broad
ism — let alone how best to measure it.63 Dozens areas that the curriculum should cover and assess,
of scales that rate communication are used in for the most part individual medical schools make
medical education and research,64 yet there is their own decisions about methods and standards
little evidence that any one scale is better than of assessment. This model may have the advan-
another; furthermore, the experiences that patients tage of ensuring consistency between the curricu-
report often differ considerably from ratings given lum and assessment, but it also makes it difficult
by experts.65 to compare students across medical schools for
the purpose of subsequent training.67 The ideal
Multimethod and Longitudinal Assessment balance between nationally standardized and
The use of multiple methods of assessment can school-specific assessment remains to be deter-
overcome many of the limitations of individual mined. Furthermore, within a given medical school,
assessment formats.8,22,36,66 Variation of the clin- all students may not require the same package of
ical context allows for broader insights into com- assessments — for example, initial screening ex-
petence, the use of multiple formats provides aminations may be followed by more extensive
greater variety in the areas of content that are testing for those who have difficulties.
evaluated, and input from multiple observers pro-
vides information on distinct aspects of a trainee’s Assessment and Learning
performance. Longitudinal assessment avoids ex- It is generally acknowledged that assessment drives
cessive testing at any one point in time and serves learning; however, assessment can have both in-
as the foundation for monitoring ongoing profes- tended and unintended consequences.22 Students
sional development. study more thoughtfully when they anticipate cer-
In the example at the beginning of this article, tain examination formats,68 and changes in the
a multimethod assessment might include direct format can shift their focus to clinical rather than
observation of the student interacting with several theoretical issues.69 Assessment by peers seems

n engl j med 356;4 www.nejm.org january 25, 2007 393


The New England Journal of Medicine
Downloaded from nejm.org on March 1, 2011. For personal use only. No other uses without permission.
Copyright © 2007 Massachusetts Medical Society. All rights reserved.
The n e w e ng l a n d j o u r na l of m e dic i n e

to promote professionalism, teamwork, and com- premature closure, note exceptions to rules and
munication.52 The unintended effects of assess- principles, and — even when under stress —
ment include the tendency for students to cram choose one of the several courses of action that
for examinations and to substitute superficial are acceptable but imperfect. Testing either induc-
knowledge for reflective learning. tive thinking (the organization of data to gener-
ate possible interpretations) or deductive thinking
Assessment of Expertise (the analysis of data to discern among possibili-
The assessment of trainees and physicians who ties) in situations in which there is no consensus
have higher levels of expertise presents particular on a single correct answer presents formidable
challenges. Expertise is characterized by unique, psychometric challenges.
elaborated, and well-organized bodies of knowl-
edge that are often revealed only when they are Assessment and Future Performance
triggered by characteristic clinical patterns.70,71 The evidence that assessment protects the public
Thus, experts who are unable to access their from poor-quality care is both indirect and scarce;
knowledge in artificial testing situations but who it consists of a few studies that show correlations
make sound judgments in practice may do poorly between assessment programs that use multiple
on some tests that are designed to assess commu- methods and relatively crude estimates of quality
nication skills, knowledge, or reasoning. Further- such as diagnostic testing, prescribing, and refer-
more, clinical expertise implies the practical wis- ral patterns.72 Correlating assessment with future
dom to manage ambiguous and unstructured performance is difficult not only because of in-
problems, balance competing explanations, avoid adequacies in the assessment process itself but
also because relevant, robust measures of out-
Table 2. Principles of Assessment. come that can be directly attributed to the effects
of training have not been defined. Current efforts
Goals of assessment
to measure the overall quality of care include pa-
Provide direction and motivation for future learning, including knowledge,
skills, and professionalism tient surveys and analyses of institutional and
Protect the public by upholding high professional standards and screening practice databases. When these new tools are re-
out trainees and physicians who are incompetent fined, they may provide a more solid foundation
Meet public expectations of self-regulation for research on educational outcomes.
Choose among applicants for advanced training
What to assess
Habits of mind and behavior C onclusions
Acquisition and application of knowledge and skills
Communication Considering all these challenges, current assess-
Professionalism ment practices would be enhanced if the principles
Clinical reasoning and judgment in uncertain situations
summarized in Table 2 were kept clearly in mind.
Teamwork
Practice-based learning and improvement The content, format, and frequency of assessment,
Systems-based practice as well as the timing and format of feedback,
How to assess should follow from the specific goals of the med-
Use multiple methods and a variety of environments and contexts to capture ical education program. The various domains of
different aspects of performance
competence should be assessed in an integrated,
Organize assessments into repeated, ongoing, contextual, and developmen-
tal programs coherent, and longitudinal fashion with the use
Balance the use of complex, ambiguous real-life situations requiring reason- of multiple methods and provision of frequent
ing and judgment with structured, simplified, and focused assessments and constructive feedback. Educators should be
of knowledge, skills, and behavior
Include directly observed behavior mindful of the impact of assessment on learning,
Use experts to test expert judgment the potential unintended effects of assessment,
Use pass–fail standards that reflect appropriate developmental levels the limitations of each method (including cost),
Provide timely feedback and mentoring and the prevailing culture of the program or in-
Cautions stitution in which the assessment is occurring.
Be aware of the unintended effects of testing
Avoid punishing expert physicians who use shortcuts
Assessment is entering every phase of profes-
Do not assume that quantitative data are more reliable, valid, or useful than sional development. It is now used during the
qualitative data medical school application process,73 at the start
of residency training,74 and as part of the “main-

394 n engl j med 356;4 www.nejm.org january 25, 2007

The New England Journal of Medicine


Downloaded from nejm.org on March 1, 2011. For personal use only. No other uses without permission.
Copyright © 2007 Massachusetts Medical Society. All rights reserved.
Medical Education

tenance of certification” requirements that several work, and expertise that have been difficult to
medical boards have adopted.75 Multiple methods define and quantify.
of assessment implemented longitudinally can No potential conflict of interest relevant to this article was
reported.
provide the data that are needed to assess train- I thank Tana Grady-Weliky, M.D., Stephen Lurie, M.D., Ph.D.,
ees’ learning needs and to identify and remediate John McCarthy, M.S., Anne Nofziger, M.D., and Denham Ward,
suboptimal performance by clinicians. Decisions M.D., Ph.D., for helpful suggestions.
about whether to use formative or summative
assessment formats, how frequently assessments A video showing an exami-
should be made, and what standards should be nation involving a standard-
ized patient is available with
in place remain challenging. Educators also face
the full text of this article at
the challenge of developing tools for the assess- www.nejm.org.
ment of qualities such as professionalism, team-

References
1. Epstein RM, Hundert EM. Defining 15. Shanafelt TD, Bradley KA, Wipf JE, of clinical reflective capacity early in train-
and assessing professional competence. Back AL. Burnout and self-reported patient ing as a predictor of clinical reasoning
JAMA 2002;287:226-35. care in an internal medicine residency performance at the end of residency: an
2. Batalden P, Leach D, Swing S, Dreyfus program. Ann Intern Med 2002;136:358- experimental study on the script concor-
H, Dreyfus S. General competencies and 67. dance test. Med Educ 2001;35:430-6.
accreditation in graduate medical educa- 16. Borrell-Carrio F, Epstein RM. Prevent- 29. Frederiksen N. The real test bias: influ-
tion. Health Aff (Millwood) 2002;21(5): ing errors in clinical practice: a call for self- ences of testing on teaching and learning.
103-11. awareness. Ann Fam Med 2004;2:310-6. Am Psychol 1984;39:193-202.
3. Leach DC. Competence is a habit. 17. Leung WC. Competency based medical 30. Schuwirth LW, van der Vleuten CP,
JAMA 2002;287:243-4. training: review. BMJ 2002;325:693-6. Donkers HH. A closer look at cueing ef-
4. Fraser SW, Greenhalgh T. Coping with 18. Friedman Ben-David M. The role of as- fects in multiple-choice questions. Med
complexity: educating for capability. BMJ sessment in expanding professional hori- Educ 1996;30:44-9.
2001;323:799-803. zons. Med Teach 2000;22:472-7. 31. Friedman MH, Connell KJ, Olthoff AJ,
5. Klass D. Reevaluation of clinical com- 19. Sullivan W. Work and integrity: the Sinacore JM, Bordage G. Medical student
petency. Am J Phys Med Rehabil 2000;79: crisis and promise of professionalism in errors in making a diagnosis. Acad Med
481-6. America. 2nd ed. San Francisco: Jossey- 1998;73:Suppl:S19-S21.
6. Bordage G, Zacks R. The structure of Bass, 2005. 32. Graber ML, Franklin N, Gordon R.
medical knowledge in the memories of 20. Schuwirth L, van der Vleuten C. Merg- Diagnostic error in internal medicine.
medical students and general practitioners: ing views on assessment. Med Educ 2004; Arch Intern Med 2005;165:1493-9.
categories and prototypes. Med Educ 1984; 38:1208-10. 33. Pulito AR, Donnelly MB, Plymale M,
18:406-16. 21. Wass V, Van der Vleuten C, Shatzer J, Mentzer RM Jr. What do faculty observe
7. Gruppen LD, Frohna AZ. Clinical rea- Jones R. Assessment of clinical compe- of medical students’ clinical performance?
soning. In: Norman GR, Van Der Vleuten tence. Lancet 2004;357:945-9. Teach Learn Med 2006;18:99-104.
CP, Newble DI, eds. International hand- 22. Van Der Vleuten CPM. The assessment 34. Norman G. The long case versus ob-
book of research in medical education. of professional competence: developments, jective structured clinical examinations.
Part 1. Dordrecht, the Netherlands: Kluwer research and practical implications. Adv BMJ 2002;324:748-9.
Academic, 2002:205-30. Health Sci Educ 1996;1:41-67. 35. Norcini JJ, Blank LL, Duffy FD, Fortna
8. Epstein RM, Dannefer EF, Nofziger 23. Schuwirth LW, van der Vleuten CP. GS. The mini-CEX: a method for assess-
AC, et al. Comprehensive assessment of Different written assessment methods: ing clinical skills. Ann Intern Med 2003;
professional competence: the Rochester ex- what can be said about their strengths 138:476-81.
periment. Teach Learn Med 2004;16:186- and weaknesses? Med Educ 2004;38:974-9. 36. Van der Vleuten CP, Norman GR, De
96. 24. Schuwirth LW, Verheggen MM, van Graaff E. Pitfalls in the pursuit of objec-
9. Ericsson KA. Deliberate practice and der Vleuten CP, Boshuizen HP, Dinant GJ. tivity: issues of reliability. Med Educ 1991;
the acquisition and maintenance of expert Do short cases elicit different thinking 25:110-8.
performance in medicine and related do- processes than factual knowledge ques- 37. Anastakis DJ, Cohen R, Reznick RK.
mains. Acad Med 2004;79:Suppl:S70-S81. tions do? Med Educ 2001;35:348-56. The structured oral examination as a
10. Epstein RM. Mindful practice. JAMA 25. Case S, Swanson D. Constructing writ- method for assessing surgical residents.
1999;282:833-9. ten test questions for the basic and clinical Am J Surg 1991;162:67-70.
11. Schon DA. Educating the reflective sciences. 3rd ed. Philadelphia: National 38. Ram P, Grol R, Rethans JJ, Schouten
practitioner. San Francisco: Jossey-Bass, Board of Medical Examiners, 2000. B, Van der Vleuten C, Kester A. Assess-
1987. 26. Farmer EA, Page G. A practical guide ment of general practitioners by video ob-
12. Epstein RM. Mindful practice in ac- to assessing clinical decision-making skills servation of communicative and medical
tion. II. Cultivating habits of mind. Fam using the key features approach. Med performance in daily practice: issues of
Syst Health 2003;21:11-7. Educ 2005;39:1188-94. validity, reliability and feasibility. Med
13. Dreyfus HL. On the Internet (thinking 27. Charlin B, Roy L, Brailovsky C, Goulet Educ 1999;33:447-54.
in action). New York: Routledge, 2001. F, van der Vleuten C. The Script Concor- 39. Papadakis MA. The Step 2 clinical-
14. Eraut M. Learning professional pro- dance test: a tool to assess the reflective skills examination. N Engl J Med 2004;350:
cesses: public knowledge and personal clinician. Teach Learn Med 2000;12:189- 1703-5.
experience. In: Eraut M, ed. Developing 95. 40. Hodges B, McIlroy JH. Analytic global
professional knowledge and competence. 28. Brailovsky C, Charlin B, Beausoleil S, OSCE ratings are sensitive to level of
London: Falmer Press, 1994:100-22. Cote S, Van der Vleuten C. Measurement training. Med Educ 2003;37:1012-6.

n engl j med 356;4 www.nejm.org january 25, 2007 395


The New England Journal of Medicine
Downloaded from nejm.org on March 1, 2011. For personal use only. No other uses without permission.
Copyright © 2007 Massachusetts Medical Society. All rights reserved.
Medical Education

41. Reznick RK, Blackmore D, Cohen R, Mooney C, Epstein RM. Temporal and Assessing communication competence:
et al. An objective structured clinical ex- group-related trends in peer assessment a review of current tools. Fam Med 2005;
amination for the licentiate of the Medi- amongst medical students. Med Educ 2006; 37:184-92.
cal Council of Canada: from research to 40:840-7. 65. Epstein RM, Franks P, Fiscella K, et
reality. Acad Med 1993;68:Suppl:S4-S6. 54. Lurie S, Lambert D, Grady-Weliky TA. al. Measuring patient-centered communi-
42. Wass V, Jones R, Van der Vleuten C. Relationship between dean’s letter rank- cation in patient-physician consultations:
Standardized or real patients to test clini- ings, peer assessment during medical theoretical and practical issues. Soc Sci
cal competence? The long case revisited. school, and ratings by internship directors. Med 2005;61:1516-28.
Med Educ 2001;35:321-5. [Submitted for Publication] (in press). 66. Norman GR, Van der Vleuten CP, De
43. Carney PA, Eliassen MS, Wolford GL, 55. Calhoun JG, Woolliscroft JO, Hockman Graaff E. Pitfalls in the pursuit of objec-
Owen M, Badger LW, Dietrich AJ. How EM, Wolf FM, Davis WK. Evaluating med- tivity: issues of validity, efficiency and ac-
physician communication influences rec- ical student clinical skill performance: ceptability. Med Educ 1991;25:119-26.
ognition of depression in primary care. relationships among self, peer, and expert 67. Colliver JA, Vu NV, Barrows HS.
J Fam Pract 1999;48:958-64. ratings. Proc Annu Conf Res Med Educ Screening test length for sequential test-
44. Epstein RM, Franks P, Shields CG, et 1984;23:205-10. ing with a standardized-patient examina-
al. Patient-centered communication and 56. Hall JA, Milburn MA, Roter DL, Dal- tion: a receiver operating characteristic
diagnostic testing. Ann Fam Med 2005;3: troy LH. Why are sicker patients less satis- (ROC) analysis. Acad Med 1992;67:592-5.
415-21. fied with their medical care? Tests of two 68. Hakstian RA. The effects of type of
45. Tamblyn RM. Use of standardized pa- explanatory models. Health Psychol 1998; examination anticipated on test prepara-
tients in the assessment of medical prac- 17:70-5. tion and performance. J Educ Res 1971;64:
tice. CMAJ 1998;158:205-7. 57. Chang JT, Hays RD, Shekelle PG, et al. 319-24.
46. Kravitz RL, Epstein RM, Feldman MD, Patients’ global ratings of their health 69. Newble DI, Jaeger K. The effect of as-
et al. Influence of patients’ requests for care are not associated with the technical sessments and examinations on the learn-
direct-to-consumer advertised antidepres- quality of their care. Ann Intern Med 2006; ing of medical students. Med Educ 1983;
sants: a randomized controlled trial. JAMA 144:665-72. 17:165-71.
2005;293:1995-2002. [Erratum, JAMA 2005; 58. Butterfield PS, Mazzaferri EL. A new 70. Schmidt HG, Norman GR, Boshuizen
294:2436.] rating form for use by nurses in assessing HP. A cognitive perspective on medical
47. Reznick RK, MacRae H. Teaching sur- residents’ humanistic behavior. J Gen In- expertise: theory and implication. Acad
gical skills — changes in the wind. N Engl tern Med 1991;6:155-61. Med 1990;65:611-21. [Erratum, Acad Med
J Med 2006;355:2664-9. 59. Klessig J, Robbins AS, Wieland D, Ru- 1992;67:287.]
48. Ramsey PG, Wenrich MD, Carline JD, benstein L. Evaluating humanistic attri- 71. Bowen JL. Educational strategies to
Inui TS, Larson EB, LoGerfo JP. Use of butes of internal medicine residents. J Gen promote clinical diagnostic reasoning.
peer ratings to evaluate physician perfor- Intern Med 1989;4:514-21. N Engl J Med 2006;355:2217-25.
mance. JAMA 1993;269:1655-60. 60. Eva KW, Regehr G. Self-assessment in 72. Tamblyn R, Abrahamowicz M, Dau-
49. Dannefer EF, Henson LC, Bierer SB, et the health professions: a reformulation phinee WD, et al. Association between li-
al. Peer assessment of professional com- and research agenda. Acad Med 2005;80: censure examination scores and practice
petence. Med Educ 2005;39:713-22. Suppl:S46-S54. in primary care. JAMA 2002;288:3019-26.
50. Violato C, Marini A, Toews J, Lockyer 61. Carraccio C, Englander R. Evaluating 73. Eva KW, Reiter HI, Rosenfeld J, Nor-
J, Fidler H. Feasibility and psychometric competence using a portfolio: a literature man GR. The ability of the multiple mini-
properties of using peers, consulting phy- review and Web-based application to the interview to predict preclerkship perfor-
sicians, co-workers, and patients to assess ACGME competencies. Teach Learn Med mance in medical school. Acad Med 2004;
physicians. Acad Med 1997;72:Suppl 1:S82- 2004;16:381-7. 79:Suppl:S40-S42.
S84. 62. Committee on Quality of Health Care 74. Lypson ML, Frohna JG, Gruppen LD,
51. Norcini JJ. Peer assessment of compe- in America. Crossing the quality chasm: Woolliscroft JO. Assessing residents’ com-
tence. Med Educ 2003;37:539-43. a new health system for the 21st century. petencies at baseline: identifying the gaps.
52. Nofziger AC, Davis B, Naumburg EH, Washington, DC: National Academy Press, Acad Med 2004;79:564-70.
Epstein RM. The impact of peer assess- 2001. 75. Batmangelich S, Adamowski S. Main-
ment on professional development. Pre- 63. Ginsburg S, Regehr G, Lingard L. tenance of certification in the United
sented at the Ottawa Conference on Medi- Basing the evaluation of professionalism States: a progress report. J Contin Educ
cal Education and Assessment, Ottawa, on observable behaviors: a cautionary tale. Health Prof 2004;24:134-8.
July 15, 2002. abstract. Acad Med 2004;79:Suppl:S1-S4. Copyright © 2007 Massachusetts Medical Society.
53. Lurie SJ, Nofziger AC, Meldrum S, 64. Schirmer JM, Mauksch L, Lang F, et al.

early job alert service available at the nejm careercenter


Register to receive weekly e-mail messages with the latest job openings
that match your specialty, as well as preferred geographic region,
practice setting, call schedule, and more. Visit the NEJM CareerCenter
at www.nejmjobs.org for more information.

396 n engl j med 356;4 www.nejm.org january 25, 2007

The New England Journal of Medicine


Downloaded from nejm.org on March 1, 2011. For personal use only. No other uses without permission.
Copyright © 2007 Massachusetts Medical Society. All rights reserved.

You might also like