EdTheoryMadePractical Volume3

See discussions, stats, and author profiles for this publication at: https://www.researchgate.
net/publication/344635575
Education Theory Made Practical: Volume 3
Book · October 2020
CITATIONS READS
0 1,936
6 authors, including:
Daniel W. Robinson Teresa Man-Yee Chan

University of Chicago McMaster University
7 PUBLICATIONS 49 CITATIONS 346 PUBLICATIONS 3,949 CITATIONS
SEE PROFILE SEE PROFILE
Sara Krzyzaniak Michael Gottlieb

OSF Saint Francis Medical Center Rush University Medical Center
31 PUBLICATIONS 317 CITATIONS 411 PUBLICATIONS 3,811 CITATIONS
SEE PROFILE SEE PROFILE
Some of the authors of this publication are also working on these related projects:
Data Sciences View project
FOAMSearch View project
All content following this page was uploaded by Teresa Man-Yee Chan on 13 October 2020.
The user has requested enhancement of the downloaded file.

EDUCATION THEORY
MADE PRACTICAL VOLUME 3
Robinson | Chan | Krzyzaniak | Gottlieb | Schnapp | Spector | Papanagnou

A Project of the Faculty Incubator
Academic Life in Emergency Medicine
COPYRIGHT
Education Theory Made Practical: Volume 3

Published by Academic Life in Emergency Medicine,
San Francisco, California, USA.
First edition, October 2020.
Available for usage under the Creative Commons Attribution-

NonCommercial-NoDerivs 3.0 Unported License.
ISBN: 978-0-9992825-7-1
ii
EDUCATION THEORY
MADE PRACTICAL
VOLUME 3
Editors
Daniel Robinson, MD
Teresa M. Chan, MD, MHPE
Sara Krzyzaniak, MD
Michael Gottlieb, MD
Benjamin H. Schnapp, MD, MHPE
Jordan Spector, MD
Dimitrios Papanagnou, MD, MPH
iii
DEDICATION
In the time of the global pandemic, the #FOAMed community has
really shown its strength, breadth, and depth in creating timely and
new educational content to support frontline workers and educa-
tors to adapt under highly unusual circumstances. We would like to
take the time in this dedication to thank all the tireless emergency
medical providers, physicians, nurses, respiratory therapists,
healthcare aides, cleaners, food services staff, administrators and
other prominent individuals to help combat the ongoing pandemic.
Particularly, we would like to thank Academic Life in Emergency Med-

icine (ALiEM) team for their continued support, especially Dr.
Michelle Lin for her inspired leadership and tireless work during
these trying times to support our community-at-large. We know
that initiatives like the Teaching in the Time of COVID-19 blog series
has been found highly useful. The relaunch of the ALiEMU plat-
form to support trainees and educators even further with free high-
value and accessible content is also of great benefit.
Finally, thank you also to all the participants of the 2018-2019 class
of the ALiEM Faculty Incubator. Your persistence and hard work
have resulted in this book, which allows educators to easily find,
use, and understand medical education theories and frameworks.
We are so proud of all the work you did for this scholarly book. We
are sure that the work you have done here will go on to help many
other junior clinician educators in the future. We are so proud of all
that you did in 2018-2019 and we look forward to seeing all that
you will do going forward!
Daniel Robinson, MD
Sara Krzyzaniak, MD
Jordan Spector, MD
iv
ABOUT THIS BOOK
Education Theory Made Practical (Volume 3) continues our case-based

discussion of core theories and frameworks in medical education. A col-
laborative project between the Academic Life in Emergency Medicine
(ALiEM.com) and International Clinician Educators (ICE) blog, this
project is meant to help beginning clinician educators gain a sense of
how education theory can apply to their daily practice.
Each chapter was written, edited, and released on the ICE blog over a
four-month period, where peer-review was sought and subsequently in-
corporated into this final version.
Each chapter begins with a common case facing educators, followed by

a discussion of the theory itself and modern applications of the theory,
and finally the case is closed by discussing how the specific theory can
be applied to the learner. An annotated bibliography is also included to
provide the reader with additional resources for further learning. Each
chapter can be read independently or in series at the reader’s preference.
Since these materials were originally derived as part of the Free Open
Access Medical Education (FOAM or #FOAMed) movement, we are
committed to distributing this resource as a free ebook.
v
Purpose
The Education Theory Made Practical ebook is designed to provide an efficient primer on ten core edu-
cation theories that can be applied by the reader in a practical manner, while also providing a resource
for identifying further relevant literature.
Usage
This document is licensed for use under the creative commons selected license:
Attribution-NonCommercial-NoDerivs 3.0 Unported.
Where can I find this online?

The ALiEM Education Theory Made Practical series can be found online at:
https://www.aliem.com/library
Editors
Daniel Robinson, MD
Sara Krzyzaniak, MD
Jordan Spector, MD
Copy and Layout Editor

Janatani Balakumaran, MD
Foreword
David Sklar, MD
vi
Chapter Authors
Chapter 1 Kern’s Model of Curriculum Development
Authors: Chris Lloyd DO, Simiao Li-Sauerwine MD, Shannon McNamara MD
Editor: Benjamin H. Schnapp MD
Chapter 2 The Kirkpatrick Model: Four Levels for Evaluating Learning

Authors: Christopher Fowler DO, Lisa Hoffman DO, Shreya Trivedi MD, Amanda Young MD
Editor: Dimitrios Papnagnou MD MPH
Chapter 3 Realist Evaluation

Authors: Jason An, MD; Christine Stehman, MD; Randy Sorge, MD
Editor: Jordan Spector, MD
Chapter 4 Mastery Learning

Authors: Michael Barrie, MD; Shawn Dowling, MD, FRCPC; Nicole Rocca, MD, FRCPC
Editors: Jordan Spector, MD
Chapter 5 Cognitive Theory of Multimedia Learning

Authors: Laurie Mazurik, MD; Elissa Moore, DO; Megan Stobart-Gallagher, DO; Quinn Wicks, MD
Editor: Daniel W. Robinson, MD
Chapter 6 Validity
Authors: Rebecca Shaw, MBBS; Carly Silvester, MBBS
Editor: Dimitrios Papnagnou, MD MPH
Chapter 7 Programmatic Assessment

Authors: Elizabeth Dubey, MD; Christian Jones, MD; Annahieta Kalantari, DO
Editors: Sara M. Krzyzaniak, MD
Chapter 8 Self-Assessment Seeking

Authors: Nilantha Lenora, MD; Layla Abubshait, MD; Manu Ayyan, MBBS
Editors: Benjamin H. Schnapp MD MEd; Teresa M. Chan MD, MHPE
Chapter 9 Bolman & Deal Four-Frame Model

Authors: Lexie Mannix, MD; Shawn Mondoux, MD; David Story, MD
Editors: Michael Gottlieb, MD
Chapter 10 Kotter’s Stages of Change

Authors: Dallas Holladay, DO; Melissa Parsons, MD; Gannon Sungar, DO
Editors: Daniel W. Robinson, MD
vii
FOREWORD
I write this forward to the third volume of Educational Theory Made Practical
as the COVID-19 pandemic circles the earth visiting countries and populations
with variable effects and altering the health care delivery and health profes-
sions education priorities across the globe. Now more than ever we need to un-
derstand how to develop new curriculum to teach our health professionals the
new information and skills they need to combat and prevent this disease and
how to assess what new skills they may need. Educational theory informs our
educational efforts by providing a framework for addressing each element of
health professions education. In this third volume of Educational Theory Made
Practical as in the two previous volumes there are short presentations of impor-
tant theories that help educators at all levels to find and use theories that will
help them address problems they are confronting. There are short introductions
of the topic and cases that illustrate how the theory might be applied followed
by a summary of the development of the theory, modification over time and
some recent literature that discusses the theory. This approach provides an in-
troduction to the theory for those unfamiliar with it and a revisiting of key con-
cepts for those who have used it in the past but would like to refresh their
knowledge.
I noticed as I reviewed Volume Three that there were several topics that I have
used frequently when I was Editor-in-Chief of the journal Academic Medicine.
For example, I used the Kirkpatrick model when reviewing innovation reports
to help me analyze the evaluation of the innovation. Did the innovation im-
prove knowledge or skills? Did it also change behavior? The Kirkpatrick Model
provided a framework for me to be able to think about submissions on new
projects in Health professions education. I also used it to help authors improve
their presentation of their work. While the theory was originally developed for
use in business, it has also been very valuable in Health professions education
and been modified over time to improve it. Another topic that appears in Vol-
ume 3 is Kotter’s Stages of Change. This theory was also developed for business
initially but has great relevance to change in Health professions education and
in administration of health sciences centers. It is particularly relevant during the
COVID-19 pandemic as we have had to make urgent changes in patient care to
address COVID-19 and to adjust our educational programs to reduce risk of in-
viii
fection. A third topic that I have referred to frequently is Mastery Learning which has
appeared increasingly as health professions education has moved from a time based
approach programs to a competency based approach. Mastery Learning is a theory
that helps us develop programs for learners of different capabilities so that most can
succeed given adequate time, support and clear expectations.
Many of us who teach in medical schools may resist engagement in health professions
education theory because we think it is not practical and a waste of time. We may think
we can base our teaching on observations of our students and their gaps in knowledge.
But the COVID-19 pandemic has shown us that our students need to have the skills to
find new information on their own and integrate it into a new mental model for health-
care and population health. Research is constantly providing new content in the bio-
and health sciences that must be assessed and incorporated or we can become stuck in
the past practicing out of date medicine. Educational theory can help us to develop
health care professionals who can continue to learn and adapt to a changing set of con-
ditions and diseases and integrate new information into older concepts.
When I hear disparaging comments about Health professions education theory, I am

reminded of the way that various theories have changed human history from Darwin’s
Theory of Evolution and Natural Selection to Einstein’s Theory of Relativity. Theories
lead us to research to confirm, modify or reject the theory which then can produce bet-
ter theories and improvements in human knowledge. I encourage anyone who would
consider a career in health professions education to become familiar with the key theo-
ries in our field. Most scholarship in health professions education will generally begin
with an examination of the theory that underlies how we understand a problem and
our gaps in knowledge about the problem. Reviewers and editors expect that serious
scholarship will include a description of the theoretical constructs that underpin pub-
lished educational works. The Educational Theory Made Practical series is an excellent
starting point for serious scholarship as well as an accessible review for presentations
and lectures on health professions education for clinical faculty, education fellows and
other advanced education students.
David Sklar, MD
Professor Arizona State University
Past Editor-in-Chief of Academic Medicine

(from January 2012 to January 2020)
ix
CH AP T E R 1
Six Steps Model of Curriculum Development
Authors: Chris Lloyd, DO; Simiao Li-Sauerwine, MD, MS; Shannon McNamara, MD
Editor: Benjamin H. Schnapp, MD, MHPE
A Case
Sally is a junior faculty member who has been tasked by her residency program director with
developing an EKG curriculum for incoming Emergency Medicine (EM) interns. She has never had
the opportunity to develop a course before. She thinks back to how she gained experience reading
EKGs herself: on-shift experiential learning without a formal curriculum. She is sure that she can do
better for her learners, but has a lot of questions about how to proceed. Various questions run
through her brain…
• How can she determine what content is appropriate?
• How can she include high-yield cases that are correctly tailored to her audience?
• What format should she use?
• How will she know if her curriculum was successful?
Sally is not sure how to proceed in order to create the optimal curriculum.
10
OVERVIEW | SIX STEPS MODEL
Kern and Thomas’ Curriculum Development for Medical Educators is designed for use by medical educators as a
framework for creation of educational experiences.1 Their model is organized into six steps:
Problem Identification/General Needs Assessment: What health care problem are you trying to solve? What
gap requires attention? Once the need is identified, consider: What is the current approach? What is the
ideal approach? Typically this will involve a review of the literature describing current strategies used
to approach this problem.
Targeted Needs Assessment: What is needed here? This assessment analyses the learners involved in the
curriculum within their unique environment, as well as considering the local resources that may be
required. What do the learners already know? Quantitative and/or qualitative data can be collected
during this step.
Goals and Objectives: What are you going to accomplish? This step communicates what your curriculum
is about to others. While goals can be more generalized, learning objectives must be specific and
measurable: who will do how much of what should you do, and by when?
Educational Strategies: What methods will you use? The type of methods employed depend on the
objectives developed previously. Some objectives may be more amenable to simulation while others
may be best achieved through small group discussion or asynchronous online resources. Including
multiple strategies may help increase knowledge retention.
Implementation: How do I put it all together? You must obtain local support for the curriculum, decide
what resources are required, and determine barriers to overall success before starting. This phase often
involves piloting your curriculum and a phase-in period before full administration, with adjustments
based on what is learned.
Measuring Outcomes: What worked and what didn’t? The assessment phase will not only target the
individuals participating in the curriculum but also the program itself. Evaluations can be formative
(ongoing feedback for improvement) or summative (e.g. a “grade”). This step is also important for
documentation of your achievements.
MAIN ORIGINATORS OF THE THEORY

•Thomas Kern
•Patricia Thomas
11
Background
Kern and Thomas’ Six Steps of Curriculum Development emphasize that there is much more
to quality teaching than throwing together a slide set or simulation case.
Designing a great curriculum is dependent on first having a clearly defined problem and
then seeking out effective solutions. What are other institutions or organizations doing to
solve the problem? What is the perspective of patients, learners, and educators? What is
the ideal approach? This is an opportunity to gather evidence-based reviews, best practice
guidelines, and original research that is applicable to your needs in order to find the best
approach for your learners. For the intern EKG course from the introductory scenario, The
Accreditation Council for Graduate Medical Education (ACGME) and the Residency
Review Committee for Emergency Medicine (RRC-EM) both have core competencies that
must be demonstrated in EKG interpretation. These competencies should be considered
and included in the general needs assessment.
Once the problem is defined, one must determine what group is most affected by your
learning gap - that is your targeted learner! This may be patients, providers, educators, or
students. One may use surveys, informal discussion, formal interviews, tests or
questionnaires to discover where specific gaps exist in your local population. In the EKG
example above, faculty might use test scores, Clinical Competency Committee reviews of
resident performance, and informal discussion with other faculty of knowledge gaps in
EKG interpretation to create a targeted needs analysis.
The goals of a curriculum will communicate the overall purpose to others. While a broad
vision of developing a foundation for EKG interpretation in first year EM residents may be
a worthy goal for a curriculum, the objectives must be specific and measurable. Bloom’s
taxonomy can be helpful here to provide useful verbs to describe desired actions.
Objectives should be made with the individual learner as well as the overall program in
mind. A cognitive learner-focused objective may involve a resident’s ability to describe the
differential for ST-elevation. An aggregate program objective may aim for a specific
percentage of residents in the program being able to do so. Aggregate objectives may also
incorporate performance improvement on standardized assessments or simulation cases.
Use of core competencies in the targeted needs assessment creates a smooth transition to
these objectives. For example, a curriculum focused on EKG interpretation will encompass
many of the competencies of patient care and systems-based practice.
The educational content of the curriculum in combination with the objectives will comprise
the syllabus. This will include the educational resources required: articles, texts, simulation
exercises, discussion sessions, and the planned assessment. Multiple methods should be
employed to enhance knowledge retention. Be sure to match objectives with appropriate
12
educational methods. Cognitive or knowledge-based objectives may be more congruent
with lectures, online resources, and team-based learning. Psychomotor skills may be best
approached with supervised clinical experiences and simulation.
Any curriculum requires resources for success, and implementation will require
identification of what curricular time, faculty commitment, and monetary resources are
needed. How often will interns need to meet for the EKG course? How many faculty
members are needed to execute the curriculum? Minimal faculty development would be
anticipated for a intern-level EKG course, but other curricular ideas may involve additional
preparation. Orientation to the new curriculum should be employed regardless to ensure
faculty are prepared. Keep in mind that the time required is not solely the face to face
experience with the learners! Educators should budget time for pre-session preparation of
materials, feedback for learners, and overall evaluation of the curriculum.
The evaluation of the curriculum and feedback provided to both learners and educators is
the final step in the continuous loop of curriculum development. Were the goals and
objectives of the curriculum met? For the EKG course example, one would likely target
learners and faculty in order to assess the performance of the curriculum. Formative
feedback may be provided throughout the course followed by a summative ‘grade’ at the
time of completion. Both evaluations should align with the core competencies/objectives
decided on previously. Inherent in this step is deciding on appropriate evaluation questions
and design. Once the assessment tool has been created and used, the data should be
analyzed, and reported back to the key stakeholders in the curriculum.
Successful curricula will always be in a state of constant development and change. The
steps outlined by Kern and Thomas are not meant to be used strictly in a static sequence. A
targeted needs assessment and goal/objective creation can be done simultaneously.
Resource limitation, whether financial or otherwise, may lead to alteration of the
educational strategies employed. Additionally, the importance of continual program
evaluation in order to provide objective feedback on how the curriculum is functioning
cannot be over emphasized.
Modern takes on this Theory

Crowd-sourced curriculum development
Shappell and colleagues reviewed online education content available for each step in
Kern’s framework.2 They found that educational content and needs assessments are well
represented. Development of goals/objectives and program evaluation/assessment tools
are more sparse however, and is an area ripe for further innovation. While many fields
have expanded in recent years to large online learning centers, graduate medical education
13
(GME) has not yet fully embraced this model, as the creation of these centers is resource
intensive.
Competency-based medical education
GME is shifting from a fixed training period to a competency-based model in which a

resident must demonstrate specific knowledge and skills as well as demonstrating they can
apply them independently in order to graduate.3 Acquisition of this expertise is
independent of the duration of training. This evolution of the GME model fits in well with
steps two and three of Kern’s model. The Targeted Needs Assessment can be generated by
reviewing what competencies are not yet being met. Specific curriculum objectives can
then mirror the competencies being addressed. One example of such an approach
examined the competency levels achieved by residents before entering into an EM
residency program.4 Kern’s framework was used to develop an EM orientation program
based on this needs assessment.
Other Examples of Where this Theory Might Apply

The following are some worked examples of how this framework might apply in curricular
development:
Example 1: While examples of curricular development in GME will tend to focus on

students, we must also remember that as health care professionals we are also responsible
for providing education to patients and families. A group of residents completing a global
health elective use The Six Steps Model of Curriculum Development to create educational
products for clinicians and patients that were implemented locally and abroad. Residents
were introduced to Kern’s curriculum development structure, and they then identified a
health care problem in the community such as pediatric dental hygiene. Subsequent steps
led to creation of a 1-hour program on the relationship between hygiene and health that
was delivered to local refugee groups and international clinics.5
Example 2: The Six Steps Model of Curriculum Development is also applicable in partnering
with allied health professionals for professional training. For example, ultrasound-guided
peripheral intravenous (IV) line placement is a common procedure in the Emergency
Department for patients with difficult IV access. Using the curriculum development
process outlined here, courses have been developed to teach patient care technicians to be
equally skilled at placing ultrasound-guided IVs as physicians.6
14
ANNOTATED BIBLIOGRAPHY
Thomas PA, Kern DE, Hughes MT, Chen BY. Curriculum Development for Medical
Education: a Six-Step Approach. Baltimore: Johns Hopkins University Press; 2016.1
This text is the complete guide to the Six-Step Model for Curriculum Development model of curriculum
development. It thoroughly discusses each of the six steps as they apply to patient and clinician
education.
Barsuk JH, Cohen ER, Wayne DB, Siddall VJ, McGaghie WC. Developing a Simulation-
Based Mastery Learning Curriculum. Simulation in Healthcare: The Journal of the Society
for Simulation in Healthcare. 2016;11(1):52-59.7
This paper uses both the Six-Step Model for Curriculum Development and mastery learning techniques
to create a robust simulation-based ACLS curriculum. This model is a helpful framework to consider
when building similar curricula.
Lucas R, Choudhri T, Roche C, Ranniger C, Greenberg L. Developing a Curriculum for

Emergency Medicine Residency Orientation Programs. Journal of Emergency Medicine.
2014;46(5):701-705. doi:10.1016/j.jemermed.2013.08.132.4
The Six-Step Model for Curriculum Development is used here to do a complete curriculum development
process for Emergency Medicine Intern Orientation. This is a particularly accessible and realistic
application of this curricular design model in a common educational situation.
Sherbino J. Educational design: a CanMEDS guide for the health professions. Royal College
of Physicians and Surgeons of Canada; 2011.8
This book serves as both a practical manual for curriculum developers but also as a look beyond the Six-
Step Model for Curriculum Development, focusing on topics like change management techniques to
ensure that curricular innovations take hold in their environment.
Limitations of this Theory

The most obvious limitation of this theory is the time-intensive approach it requires. The
overall effectiveness of The Six Steps Model of Curriculum Development is highly dependent
on the educator demonstrating a thorough commitment to each step.
While The Six Steps Model of Curriculum Development are described linearly, in reality, they
occur in a dynamic way as the curriculum evolves. Though robust needs assessment and
post-implementation data may not always be available, it is still important to include these
elements in some way to augment the effectiveness of the curriculum implemented.
Finally, one if the key limitations of this model is that it has a tendency to confuse new
educators because of its stance on assessment of learners. Whereas in medical education,
15
assessment is often called out as a special component to a curricular design, this model
takes a program evaluator’s lens and folds the assessment outcomes of learners as a form
of program evaluation on the success of the program. This take on how assessment
integrates into a curriculum can be quite confusing for those who are new to medical
education - especially since there is an increasing emphasis on assessment of performance
within our field. Some schools of thought are that this model should have modification,
which would be that there should be a seventh separate step to encourage educators to
design assessment experiences that align to the goals, objectives, learning activities, and
other evaluation outcomes.
Returning to the case...

Sally was introduced to the Six-Step Model for Curriculum Development model for curriculum
development and applied it to her EKG course for new interns. She identified the problem by
conducting an informal survey of faculty and residents on the current approach to EKG teaching.
Sally discovered that the interns had disparate backgrounds with respect to prior EKG knowledge.
Some interns’ had solely cardiology lectures from the first two years of medical school, while others
had delved into EKG reading with advanced courses and rotations in their clinical years. The
inconsistencies in background were furthered by the variation in exposure to interesting cardiology
cases while working clinically. She then developed a general needs assessment by identifying the gap
between the ideal and current approach, and felt that dedicated coursework during intern year was
required. She tailored this to create a targeted needs assessment by taking into account the needs of
stakeholders (interns, faculty, residency program, school of medicine) and was able to determine that
while residents generally performed well in the cardiology portion of the in-service exam, there were
gaps in the knowledge of individual interns when applied to specific clinical cases. Sally then
composed goals and objectives. She determined that her goals were for interns to recognize the most
common abnormal EKGs encountered by emergency medicine physicians; specific measurable
objectives included increased accuracy in reading a selection of essential abnormal EKGs.
After some research, Sally decided that the best educational strategy was a longitudinal case-based
curriculum that would be implemented during intern morning report on conference days. She was
able to implement the course successfully by obtaining the support of the program director and
education faculty, introducing the curriculum after piloting and refining over the course of the
academic year. Finally, she obtained evaluation and feedback by creating formative and summative
individual assessments for individuals and a program evaluation of the course. After all her hard
work, all stakeholders agreed that her course was a smashing success!
16
References
1. Thomas PA, Kern DE, Hughes MT, Chen BY. Curriculum Development for Medical
Education: a Six-Step Approach. Baltimore: Johns Hopkins University Press; 2016.
2. Shappell E, Chan T M, Thoma B, et al. Crowdsourced Curriculum Development for

Online Medical Education. Cureus 9(12): e1925. doi:10.7759/cureus.1925.
3. Long DM. Competency based residency training: the next advance in graduate medical
education. Academic Medicine. 2000;75:1178-1183.
4. Lucas R, Choudhri T, Roche C, Ranniger C, Greenberg L. Developing a Curriculum for

Emergency Medicine Residency Orientation Programs. Journal of Emergency Medicine.
2014;46(5):701-705. doi:10.1016/j.jemermed.2013.08.132.
5. Sweet, LR, Palazzi, DL. Application of Kern’s Six-step approach to curriculum

development by global health residents. Educ Health 2015;28:138-41.
6. Duran-Gehring P, Bryant L, Reynolds JA, Aldridge P, Kalynych CJ, Guirgis FW.

Ultrasound-Guided Peripheral Intravenous Catheter Training Results in Physician-Level
Success for Emergency Department Technicians. J Ultrasound Med. 2016;
35(11):2343-2352.
7. Barsuk JH, Cohen ER, Wayne DB, Siddall VJ, McGaghie WC. Developing a Simulation-
Based Mastery Learning Curriculum. Simulation in Healthcare: The Journal of the
Society for Simulation in Healthcare. 2016;11(1):52-59.
8. Sherbino J. Educational design: a CanMEDS guide for the health professions. Royal
College of Physicians and Surgeons of Canada; 2011.
C HAPT E R 2
The Kirkpatrick Model

Authors: Christoper Fowler DO, Lisa Hoffman DO, Shreya Trivedi MD, Amanda Young MD
Editor: Dimitrios Papanagnou, MD
A Case
Jane is an assistant program director (APD) at her residency program. In effort of improving
resident in-service scores, she was recently charged with increasing resident engagement during
weekly didactic conference. She has already started to implement immersive activities, such as small
group sessions; TED-talk style lectures; as well as a longitudinal simulation curriculum. The
department chair and the residency program director are concerned that these curricular
modifications require a significant amount of faculty time and effort - at least when when compared
to previous academic conference offerings. They are not sure if the new curriculum is worth the
resources required to continue its sustainability in the long-term.
Jane must evaluate the effectiveness of her educational programming in order to demonstrate its
value and validate the resources invested. How can Jane evaluate the impact of the new curriculum
on her residents’ learning?
18
OVERVIEW | KIRKPATRICK
Kirkpatrick’s model is based on four levels of evaluation, where each level builds on
the previous level. Level 1, the most basic level, aims to assess how a learner reacts to
a specific training. Level 1 typically involves simple questions, and aims to examine at
“customer satisfaction.” Level 2 begins to assess how much knowledge participants
learned from training. Assessment tools will generally include tests, interviews, or
other tool to assess the learners’ knowledge following a training intervention. Level 3
aims to determine how trainees utilize newly-acquired knowledge, and focuses
primarily on actual behavior change. Level 4 seeks to determine how specific training
has made an impact on the organization level, or the group as a whole.

•Donald Kirkpatrick
Other important authors in this area:

•James and Wendy Kirkpatrick
Background
The Kirkpatrick Model is based on the work of Donald Kirkpatrick (1954), which he
initially developed to determine if his own supervisors were making a significant impact
based on training. His work has evolved over the decades to be one of the most frequently
used models for evaluating training programs. In 1959 Kirkpatrick wrote four articles that
established the basic principles of his training evaluation. Over the next decade, the model
evolved and became more widespread until it became one of the standards for industry
evaluations. In the mid 1970’s, after the widespread circulation of his thoughts, Kirkpatrick
was asked to write a book expanding the ideas that constitute his model.1
The Four Levels: A Case Study
To better clarify this framework, consider the application of the Kirkpatrick framework to a
clinical example below.
19
Kirkpatrick Level Case Study Example
Level 1: Reaction At the conclusion of the training course, residents are asked how
much they enjoyed the course, specifically focusing on what they
liked and/or disliked about the training. This information is
collected using paper survey evaluations.
Level 2: Learning One month after the initial training program, a refresher training
session was delivered, along with a 25-item questionnaire, to
determine how well the information was retained by residents.
Level 3: Behavior EM residents are observed during several 12-hour shifts by faculty
members. Assessments are made on their performance with regards
to recognizing sepsis and meeting important metrics in treating
sepsis. Direct feedback is collected after the observation sessions.
Level 4: Results Mortality rates across the hospital related to sepsis are compiled and
compared to rates before the educational intervention. Efforts are
made to link organizational outcomes to EM resident involvement
in the cases.
A training hospital has noticed that the mortality rate related to sepsis increased over the
past six months. The emergency medicine (EM) residency training program has randomly
selected to training resident providers on the early recognition of sepsis. A two-week
training course was developed and delivered to all EM residents. Below are examples of
how Kirkpatrick’s framework could be used to evaluate the effectiveness of this new
training program on EM residents.

The original purpose of Kirkpatrick’s framework was to provide business leaders with
easily identifiable and measurable outcomes in learners and the organizations for which
they worked. The success of this framework in business attracted interest from numerous
other fields (e.g. the annotated bibliography reference by Yardley and Dornan).
Kirkpatrick’s framework has been utilized to evaluate learning outcomes in the areas of
sales and marketing; computer skills; technical skills; human performance technology;
evaluation of workshops and conferences; business simulations; and “soft-skills” training,
such as team building.3
20
Kirkpatrick’s four levels have also been applied to medical education. For example, the
Best Evidence Medical Education (BEME) Collaboration adopted a modified version of
Kirkpatrick’s levels as a grading standard for bibliographic reviews. The authors developed
a prototype coding sheet using Kirkpatrick’s framework to appraise evidence and grade
the impact of education interventions, with the implication that measuring outcomes at a
higher Kirkpatrick level represented a greater quality of evidence.2,4 Additionally,
numerous researchers have used this approach to evaluate medical education literature and
determine best practices in medical education. This has been applied in arenas such as
interprofessional education initiatives,5,6 competency-based education and mastery
learning,7 teachers training workshops,8 faculty development interventions,9,10 patient
safety and quality improvement curricula,11-12 high-fidelity simulation a learning tool,13 and
internet-based medical education.14
Over the years, various adaptations of the four levels of evaluation have been proposed.
Hamtini proposed an adaptation of Kirkpatrick’s framework to evaluate the e-learning
environment.15 The framework was simplified into three levels rather than four. The
interaction phase would be used to gauge learner satisfaction with the e-learning interface
and its ease of use. The learning phase would aim to measure actual learning using a pre-
test and post-test approach. The final results phase would aim to measure the ability of the
employee to function effectively and efficiently, as well as the overall intrinsic and extrinsic
benefits to both the employee and employer following e-learning. Shappell and colleagues
also proposed an adaptation of Kirkpatrick’s framework to use in the evaluation of the
online learning environment.16 For level 1, engagement was measured (e.g., involvement
in sharing and discussion, time-on-page) as well as satisfaction. For level 2, the authors
proposed including online quizzes and/or assignments within the curriculum. For level 3,
the authors recommended measuring transfer of learning through simulated environments,
in addition to the workplace.
In their review of the literature regarding interprofessional education initiatives, Barr et al.
proposed an expanded model of Kirkpatrick’s levels.5 They proposed that, at level 2, both
modification of attitudes and perceptions (level 2a) and acquisition of knowledge and skills
(level 2b) be measured. Additionally, they proposed that results be measured both at the
level of change in organizational practices (4a) and of benefits to patients (level 4b).5
In addition to these adaptations, James and Wendy Kirkpatrick (2016), Donald

Kirkpatrick’s son and daughter-in-law, respectively, proposed the New World Kirkpatrick
Model (NWKM) in an attempt to address the many critiques of Kirkpatrick’s original levels
of evaluation.17 Amongst other things, they suggested various expansions at each level in
an attempt to account for confounders, that the levels do not have to be evaluated
sequentially, that not all programs require the “full evaluation package,” and involving
21
stakeholders in curriculum and evaluation tool development to help determine which
outcomes are the most important to measure.17,18

Schumann proposed that the Kirkpatrick framework could be utilized to evaluate
simulations as educational tools.19 While the authors aimed to evaluate business
simulations and their impact in the business world, the model could be easily applied to
medical education simulation. They suggested strategies, such as having a non-simulation
control group, utilizing pre- and post-tests, and reaching out to employers or other
observers beyond the learners to evaluate for higher level outcomes. Bewley and O’Neil
proposed an adaptation of Kirkpatrick’s framework that could be used to evaluate the
effectiveness of medical simulations.20 They suggested adding knowledge or skills
retention as a measure of sustainable behavioral change over time, perhaps making
evaluation at level 3 easier to obtain.

While Kirkpatrick’s framework has been noted over the years to have limitations when it
comes to evaluating medical education initiatives, the authors believe that it still offers
value for measuring the effectiveness of a new curriculum. Further development and
refinement of this theory is ongoing and aims to build on this foundation.

After learning about the Kirkpatrick model, Jane designs a more robust evaluation plan for her new
curriculum.
For level 1 outcomes, Jane distributes surveys to the residents regarding their satisfaction with the
new components of the conference’s curriculum. As an additional measurement at level 1, she
surveys the residents about what they feel they have learned from the new curriculum, and how
confident they are with applying what they have learned into clinical practice.
For level 2, Jane develops a tool for objectively measuring residents’ knowledge and skills. After
reviewing learning goals and objectives, she develops a pre-test and post-test relating to these
objectives. This objective measurement gives her insight into ascertaining whether her
22
implementation is having a positive influence on resident education with regards to EM core
content.
For level 3, the evaluation shifts to application. Residents must be observed in the clinical
environment, with particular attention to how they are applying knowledge learned through the new
curriculum into their clinical practice. This is achieved through direct observation of residents on
shift, and by surveying other faculty members as to what they observe when working with residents
in the emergency department. Alternatively, simulation scenarios are developed to provide an
opportunity to observe residents’ clinical skills as they relate to specific learning objectives. Jane
anticipates that the new curriculum will improve retention; and she plans to examine in-service
examination scores as another marker of impact.
For level 4, inarguably the most difficult level to measure, Jane would have to examine impact at the
organizational level, or perhaps at the level of patient outcomes. The method of measurement would
depend upon the learning goals and objectives. For example, if a simulation session was developed
for teaching procedural skills as a part of the new curriculum, Jane and the department would gather
data regarding rates of procedural complications following the educational intervention. This
information would be most useful if it were to be compared to complication rates prior to the
educational intervention.
Jane successfully completes all levels of Kirkpatrick’s models, and the impact of her curriculum on
residents education and in the hospital can be articulated to leadership.
Kirkpatrick’s framework does not readily apply to all educational interventions, and at times may
require some modification in order to assess an intervention’s effectiveness at achieving its stated
objectives. Determination of clear goals and objectives, as well as forethought into how the successful
achievement of these goals and objectives is to be measured at each level, are key to effectively using
this framework to evaluate educational programs.
23
Yardley S and Dornan T. Kirkpatrick’s levels and education ‘evidence’. Medical Education
2012;46:97-106.
Kirkpatrick’s purpose in developing the four-level evaluation framework was to provide business leaders
with prompt identifiable and easy-to-measure outcomes from learners and the organizations for which
they worked. None of Kirkpatrick’s original references to successful application of the levels came from
fields as complex as medical education. Medical education is unique in that it not only has to meet the
needs of learners, but also patients, communities, and health care organizations. The BEME collaboration
adopted a modified version of Kirkpatrick’s levels: a grading standard for bibliographic review. Not all
evaluation tools, however, are suitable for evaluating all educational programs; the evaluation method
should fit the question being asked and the type of evidence being reviewed. Kirkpatrick’s levels are
suitable for simple training interventions, where outcomes emerge rapidly and can be easily observed.
They are unsuitable for more complex educational interventions, where the most important outcomes are
longer term.
Moreau KA. Has the new Kirkpatrick generation built a better hammer for our evaluation
toolbox? Medical Teacher 2017;39(9):999-1001.
Kirkpatrick’s framework has many limitations and has been critiqued by many as inadequate for
assessing medical education. Critiques include the difficulty of evaluation at levels 3 and 4; neglect of
confounding variables; the unfounded causal chain of having to progress from level 1 through 4
sequentially; and the inability of the model to show why educational programs work. This article
discusses the New World Kirkpatrick Model (NWKM) as proposed by Kirkpatrick’s son and daughter-in-
law, and how it aims to address these critiques. The NWKM suggests involving stakeholders in
curriculum development and development of evaluation tools to determine which outcomes are most
important to measure. They also suggest expanding levels in order to account for confounders.
Additionally, they argue that the levels do not have to be evaluated sequentially, and not all programs
require the “full evaluation package”.
Praslova L. Adaptation of Kirkpatrick’s four level model of training criteria to assessment of

learning outcomes and program evaluation in higher education. Educ Asse Eval Acc
2010;22:215-25.21
This article makes various suggestions regarding the application of Kirkpatrick’s framework to higher
education. The author suggests splitting level 1 into both affective reactions (i.e., enjoyment) and utility
judgements (i.e., how much they believe they have learned). At level 2, the author suggests that writing
samples and speeches could also be used to gauge learning, in addition to traditional pre- and post-tests.
For level 3, the use of knowledge and skills in future classes, internships, or research projects could be
used to evaluate the impact of education on behavior. Finally, the author suggests that at level 4, the
beneficiary of the education first needs to be clarified, whether that be the student, society, or the
organization, before any impact at this level can be measured.
24
References
1. Our Philosophy. Kirkpatrick Partners, The One and Only Kirkpatrick Company®.
https://www.kirkpatrickpartners.com/Our-Philosophy. Accessed July 17, 2018.
2. Yardley S and Dornan T. Kirkpatrick’s levels and education ‘evidence’. Medical

Education 2012;46:97-106.
3. McLean S and Moss G. They’re happy, but did they make a difference? Applying
Kirkpatrick’s framework to the evaluation of a national leadership program. The
Canadian Journal of Program Evaluation 2003;18(10):1-23.
4. Hammick M, Dornan T and Steinert Y. Conducting a best evidence review. Part 1: from
idea to data coding. BEME guide no. 13. Medical Teacher 2010;32(1):3-15.
5. Barr H, Freeth D, Hammick M, Koppel I, and Reeves S. Positive and null effects of
interprofessional education on attitudes toward interprofessional learning and
collaboration. Advances in Health Science Education 2012;17(5):651-69.
6. Hammick M, Freeth D, Koppel I, Reeves S, and Barr H. A best evidence systematic

review of interprofessional education: BEME guide no. 9. Medical Teacher
2007;29(8):735-51.
7. Bisgaard CH, Rubak SLM, Rodt SA, Pertersen JAK, and Musaeus P. The effects of
graduate competency-based education and masterly learning on patient care and return
on investment: a narrative review of basic anesthetic procedures. BMC Medical
Education 2018;18:154. doi: 10.1186/s12909-018-1262-7.
8. Piryani RM, Dhungan GP, Piryani S, and Neupane MS. Evaluation of teachers training
workshop at Kirkpatrick level 1 using retro-pre questionnaire. Advances in Medical
Education and Practice 2018;9:453-7.
9. Steinert Y, Mann K, Anderson B, Barnett BM, Centeno A, Naismith L, Prideaux D,

Spencer J, Tullo E, Viggiano T, Ward H, and Dolmans D. A systematic review of faculty
development initiatives designed to enhance teaching effectiveness: A 10-year update:
BEME guide no. 40. Medical Teacher 2016;38(8):769-86.
10.Steinert Y, Mann K, Centeno A, Dolmans D, Spencer J, Gelula M, and Prideaux D. A

systematic review of faculty development initiatives designed to improve teaching
effectiveness in medical education: BEME guide no. 8. Medical Teacher
2006;28(6):497-526.
11.Walpola RL, McLachlan AJ, and Chen TF. A scoping review of peer-led education in
patient safety training. American Journal of Pharmaceutical Education 2018;82(2):115-23.
12.Wong BM, Etchells EE, Kuper A, Levinson W, and Shojania KG. Teaching quality
improvement and patient safety to trainees: a systematic review. 2010 Academic
Medicine 85(9):1425-39.
13.Issenberg SB, McGaghie WC, Petrusa ER, Gordon DL, and Scalese RJ. Features and uses
of high-fidelity medical simulations that lead to effective learning: a BEME systematic
review. Medical Teacher 2005;27(1):10-28.
14.Wong G, Greenhalgh T, and Pawson R. Internet-based medical education: a realist

review of what works, for whom, and in what circumstances. BMC Medical Education
2010;10:12.
15.Hamtini TM. Evaluating e-learning programs: an adaptation of Kirkpatrick’s model to

accommodate e-learning environments. Journal of Computer Science 2008;4(8):693-8.
16.Shappell E, Chan T, Thoma B, Trueger NS, Stuntz B, Cooney R, and Ahn J.

Crowdsourced curriculum development for online medical education. Cureus
2017;9(12):e1925.
17.Kirkpatrick JD, Kirkpatrick WK. 2016. Kirkpatrick's four levels of training evaluation.
Alexandria (VA): ATD Press.
18.Moreau KA. Has the new Kirkpatrick generation built a better hammer for our
evaluation toolbox? Medical Teacher 2017;39(9):999-1001.
19.Schumann PL, Anderson PH, Scott TW, and Lawton L. A framework for evaluating
simulations as educational tools. Developments in Business Simulation and Experiential
Learning 2001;28:215-20.
20.Bewley WL and O’Neil HF. Evaluation of medical simulations. Military Medicine

2013;178(10):64-75.
21.Praslova L. Adaptation of Kirkpatrick’s four level model of training criteria to

assessment of learning outcomes and program evaluation in higher education. Educ
Asse Eval Acc 2010;22:215-25.
26
CH AP T E R 3
Realist Evaluation
Authors: Jason An, MD; Christine Stehman, MD; Randy Sorge, MD
A Case
City Hospital recently hired Claire as a senior nursing administrator. Claire was recruited after
working for ten years across town at Ivory Tower Hospital, spearheading a number of operational
innovations to improve Emergency Department (ED) metrics. City Hospital hired Claire with hopes
that she could replicate the interventions and similarly improve metrics.
In her first departmental meeting, Claire proposed a number of system revisions with regard to
triage processes at the City Hospital. Claire recommended moving patients to the treatment area
immediately upon arrival in the ED, to permit physician evaluation, registration and triage to occur
simultaneously, with the goal of reducing time to provider and total patient time in the ED. Her
proposal included the creation of a ‘sort RN’ who would briefly assess patient acuity and assign the
patients to one of two treatment areas – a lower acuity venue with mostly chairs, or a higher acuity
area with stretchers. After the patient was assigned, the patient would be triaged by the nurse,
registered, and seen by a physician in the treatment room in seamless succession. In her experience,
this model worked well at Ivory Tower Hospital, so she reasoned that it should also work at City
Hospital.
Though several physicians expressed concerns with the new triage process, Claire was confident it
would work. After all, it did at Ivory Tower Hospital! Claire privately dismissed the concerns as
systemic resistance to change (especially change proposed by a newcomer). In the end, the
Department Chair supported Claire’s proposal, and, within weeks, City Hospital’s ED remodelled
their triage system to resemble the work flow at Ivory Tower Hospital.
ED administrators met again eight weeks after the new triage operations were implemented. During
this session, it seemed that everyone was unhappy, calling the new system “a disaster”. Physicians
expressed concern over inaccurate triage, citing significant delays in obtaining vital signs and EKGs
for their patients. There had been several high-risk cases where an otherwise well-appearing patient
was triaged to the low acuity zone, only to have been identified as ill, delaying appropriate care.
Furthermore, there was no clear process for “upgrading” patients from the low acuity zone to the
high acuity zone. Hospital leadership complained to the ED directors that ED metrics looked worse,
27
as a larger number of patients were leaving the ED prior to being fully registered, resulting in lost
revenue relative to the previous City Hospital triage model.
The Department Chair calls Claire into his office. He wants to hear her thoughts about why her
system failed to improve metrics at City Hospital as they had at Ivory Tower Hospital. Claire is
baffled, embarrassed, and needs to find a way to make things right.
OVERVIEW | REALIST EVALUATION
Realist (or realistic) evaluation (RE) emerged in the 1990’s as a method to evaluate the effectiveness of
complex social programs or systems. RE is not a theory but a methodologic approach to the
evaluation and assessment of complex social systems. Prior to the advent of realism, investigators
would attempt to improve complex social programs by implementing an intervention, and assessing
it’s benefit through a simple cause-and-effect analysis; that is, asking “can we make the program better
with a particular intervention?” If the intervention was associated with a desired result, investigators
would argue that the intervention was ‘the cause’ that directly led to the desired ‘effect’. And for
program directors, it followed that the same intervention might work to achieve the desired result
anywhere it was implemented. Ultimately, these hypotheses were flawed, as the same programmatic
intervention can lead to unexpected outcomes in different milieus.
The theoretical basis for RE was derived in large part by Roy Bhaskar, an economist and philosopher
who wrote a series of books to coin and describe ‘critical realism’.1 Critical realism attempts to
reconcile what can be known from rigorous hypothesis testing (positivism) and what is known
through the perception and understanding of individual observers, independent of empirical testing
(constructivism). Ray Pawson and Nick Tilley built upon Bhaskar’s theoretical framework to describe
the RE method - a program evaluation based on critical realism that requires the assessment of
multiple variables; the interactions of the multiple participants, the presence of variable resources and
the environment in which the program occurs.2,3 In RE, instead of asking simply if a program works,
evaluators attempt to answer why and what about a program works, for whom and within what
circumstances.2 RE provides program implementers a methodology for optimizing complex programs
through the systematic examination of resources, environment and context, all in an attempt to
promote a desired outcome for program participants.2.3

•Ray Pawson & Nick Tilley

•Roy Bhaskar (Critical Realism)
•Geoff Wong
•Trish Greenhalgh
28
Background
Realist Evaluation is a methodology to perform research that focuses on how programs
work in order to optimize a program’s effectiveness.4 RE has historically been used to
assess social programs; programs like one to install cameras in parking lots to decrease car
break-ins or to begin literacy lessons to increase parental involvement in school. RE is now
utilized to evaluate of any program designed to change behaviors.4 In the days before
Bhaskar, researchers would follow a successionist theory of causation: investigators set up a
trial program in one area and compare outcomes of that program to outcomes in a similar
area where the program had not been implemented (a control area). Results of these
“experiments” led investigators to declare success (or not) through inferred causation.
However, when investigators attempted the same intervention in a new location, and it did
not succeed, program stakeholders were confused.
Based on the principles of critical realism as set forth by Bhaskar and others1, Pawson and
Tilley argued that simple ‘cause and effect’ analyses were not applicable within complex
systems and programs, as they may not account for all environmental factors that
contribute to the success or failure of such programs13. The authors put forth a generative
theory of causation which relies on observations of patterns between inputs (causes) and
outputs (effects), including both objective external features and internal (possibly
unrecognized) features. These observations allow for a more nuanced examination of cause
and effect, both when the cause is effective towards the desired ends, and when it is not.
The authors cite the use of gunpowder as an example: gunpowder is an effective explosive
specifically when the conditions and circumstances are optimized: densely packed, in
sufficient quantity, dry, etc.2
Within this construct, the efficacy of a social program can only be understood with
recognition for context. In other words, only through identification of a program’s
interaction with cultural, social and economic factors can an investigator determine how
and why a program would work to produce (or not produce) the desired outcome.
RE evaluates mechanisms suspected to bring about change and asks in which contexts
these changes occur. It does so through a cycle of experimentation. First, investigators
formulate a theory of how everything (intervention, choices, relationships, behavior,
conditions) comes together to generate an outcome. Then they generate hypotheses about
what might produce a desired, sustainable change. With appropriate methods (often
qualitative and quantitative), they test these hypotheses with the goal of results or
outcomes that are highly specific to the context. Should the results deviate from expected,
theories are modified, and the cycle continues.2
29
In essence, RE is an investigative methodology that can incorporate data from a myriad of
sources to find ways to identify, state, test and predict what about program works, why it
works, for which population, and in what circumstances.

In addition to RE, the term ‘realist synthesis’ or ‘realist review’ was coined to describe the
synthesis of data when analyzing complex systems – not to distill complex issues down to
simple descriptions, but to guide programmatic leadership and policy makers with
sophisticated and practical analyses to utilize in the planning and administration of local or
regional programs.5-7
With such a broad definition, one can see how this construct might be applied to a number
of additional areas of study, beyond social programs. In the 20 years since Realistic
Evaluation was published, realist evaluation has been applied in the areas of medicine and
healthcare systems to improve infection control, interpersonal-skills assessment, disease-
specific health initiatives (such as heart health and mental illness treatment), e-learning,
faculty development, and medical education in general.8-15
There have been three modifications of RE that warrant acknowledgement here.
Keller, et al. described the combination of RE with design theory as a means to evaluate
complex innovations.16 The authors advocate that innovation should begin with explicit
identification of the underlying assumptions behind the innovation, before performing a
realist evaluation, to better understand context-mechanism-outcome triads evident after
implementation. The authors argue that this combination method will buttress the expected
efficacy and increase dissemination of such innovations.16
Bonell, et al. offered a counter to the typical RE position (set forth by Pawson and Tilley)
that randomized controlled trials are too narrowly defined to be pertinent in the assessment
of complex public interventions. This piece is long and largely theoretical, but it sets forth a
series of examples where the data from RCTs led to a better understanding of a complex
system, and the authors take the position that there need not be tension between RCTs and
realism.17
Finally, Ellaway, et al. describe a hybrid approach to systematic review.18 This study sought
to identify if and how communities that host medical education programs impact upon
said programs.18 Their literature search identified a number of papers for investigation,
though only (approximately) half were hard data (rather than a narrative or theoretical
exposition).17 All studies were examined using both an outcomes method of systematic
review as well as a realist evaluation. Overall, the authors argue that this dual method of
review created a deeper understanding of their study question and the literature as a
whole.18
30
One might argue that given the complex dynamics of any educational setting, with
variabilities in the instructor, in the students (age, level of schooling, etc.) and the possible
learning environments, RE could be used to evaluate nearly any educational program. The
two examples below will be familiar to many of us in medical education.
Problem based learning (PBL) is a curricular provision in many medical schools where
learners examine complex patient care vignettes in small group discussions. Devised as a
means to make medical learning more active and interesting for the learner, some data
suggests that graduates with PBL curricula demonstrate equivalent or superior
professional competencies compared with graduates of a more traditional medical school
curriculum.19 The PBL model is a form of RE for the learner, as lessons address clinical
issues with respect to the whole of the patient. In addition, as PBL has been proven to be
effective learning but is variably used across schools, RE might aid in an analysis of which
schools, which learners, and within which circumstances would students most benefit from
a PBL curriculum.
There has been an exponential increase in the number of online learning resources in
medical education (podcasts, websites, and blogs – termed E-learning).20 A realist review of
internet-based medical education by Wong et. al. sought to describe who is using these
resources, when they are using them, and to what benefit. The authors demonstrate that
learners were more likely to use E-learning if it offered a perceived advantage over non-
internet alternatives, if it is technically easy to use, if it is compatible with a learner’s values
and norms, and if it provided an opportunity to interface with a teacher or tutor.13

Depending on the size and breadth of a program, realist evaluation can be both time and
resource intensive.
For large programs, a realist evaluation can require input from a broad interdisciplinary
team, requiring input from those with high levels of experience and training to carry out
complicated evaluations.7 RE won’t be quick; it takes time to analyze all variables and to
accurately understand the interactions between interventions and outcomes, particularly in
complex systems.
Any evaluator hoping to use RE must be deliberate in considering logistics: the scope of the
analysis; the depth of assessment for each of a multitude of variables; the source of
foundational literature; the appraisal of primary studies; how to collate, analyze, and
31
synthesize the findings; and how to make recommendations that are applicable in the
correct context.22
Finally, a realist review attempts to describe how well a program works through
examination of a multitude of variables. But because RE cannot account for every extant
variable, realist studies tend to produce tentative recommendations at best. It cannot, by its
very nature, produce generalizable recommendations specifically because all conclusions
are context specific. Ultimately, the results of a RE often provide recommendations for fine
tuning and optimization rather than comprehensive, revolutionary system improvements.6

Claire’s system failed because she did not have a full understanding of all the contributors to the
triage process in the Emergency Department of her new hospital. Despite her operations experience
and her best intentions, Claire missed the site-specific operational differences, as well as the
environmental and cultural conditions that directly impact the efficiency of triage at her new
institution.
After some education in the RE model of program refinement, Claire tried to address individual
features that had bearing on the efficiency of ED triage at City Hospital. Slowly, Claire learned
enough from the various stakeholders to refine her triage model successfully.
She learned that City Hospital has significantly less staff per patient than her previous institution.
Claire’s original triage model left registration staff completely overwhelmed. A new and improved
triage work flow needed a mechanism to prevent providers from discharging a patient prior to them
being registered. Claire worked with Information Technology staff to create a ‘hard stop’ in the EMR
to prevent placement of a discharge order until after ED registration was complete.
Claire also learned that City Hospital has many more non-English speaking patients than at her
prior hospital, creating a language barrier in all facets of the ED visit, a variable she never had to
account for at Ivory Tower Hospital. Claire worked to place language-line phones in every treatment
room. In addition, Claire hired volunteers to work in the ED waiting room, offering non-medical
assistance, to reduce patients left without being seen.
It was specifically through Claire’s effort to better understand the intricacies of her new hospital (i.e.
how the system currently operates, their staffing capabilities, their patient population) that
operational metrics began to improve at City Hospital. In fact, about one year after Claire’s arrival,
with the new system borne of Claire’s realist evaluation of City Hospital, the ED at City posted some
of the best efficiency metrics in the history of the institution.
32
Pawson R, Tilley N. Realistic Evaluation Bloodlines. Am J Eval. 2001;22(3):317-324.
doi:10.1177/109821400102200305.21
Written by the authors most associated with RE, this piece describes the RE method in the context of
regional blood donation practices. The authors relay six articles that address different ways of acquiring
and distributing blood. The article concludes with six maxims to improve future evaluations, maxims
which many consider foundational within the realist approach. These include:
1. Always speak of evaluations in the plural – advocating for a broad array of investigative questions of
one’s program model, the combination of which are necessary to better understand and optimize
complex programs.
2. Be unafraid to ask big questions of small interventions and to use small interventions to test big
theories.
3. Use multiple methods and data sources in the light of opportunity and need.
4. Figure out which mechanisms are relevant to produce optimum outcomes by context.
5. Never expect to know “what works”, just keep trying to find out.
Direct meta-analytic inquiries at common policy mechanisms – advocating for a thorough evaluation of
the attempts at a particular intervention in multiple programs in a community, with the various
outcomes, to better understand the whole.21
Wong G, Greenhalgh T, Westhorp G, Pawson R. Realist methods in medical education

research: What are they and what can they contribute? Med Educ. 2012;46(1):89-96.15
RE is relevant to medical education research and practices. This article explains realism theory in detail
and prescribes key principles in the performance of realist research. The authors include concrete
examples of the circumstances within medical education in which realist approaches can be used
effectively, a feature that makes this article worthwhile for medical educators.
Wong G, Greenhalgh T, Pawson R. Internet-based medical education: A realist review of

what works, for whom and in what circumstances. BMC Med Educ. 2010;10(1).13
This is a realist review of internet-based medical education. The authors use two theories (Davis’s
Technology Acceptance Model and Laurillard’s model of integrative dialogue) to outline the
characteristics of internet-based medical education, arguing the intuitive point that internet learning
materials have value relative to the learner and the learning context. The authors provide a list of
questions based on their research in order to help educators and learners choose the appropriate internet-
based course for their specific situation.
33
References
1. Graeber, D. Roy Bhaskar obituary. The Guardian. 2014. Accessed Dec 29, 2019.
Available at: https://www.theguardian.com/world/2014/dec/04/roy-bhaskar.
2. Pawson R, Tilley N. Realistic Evaluation. Sage; 1997.
3. Tilley N, Pawson R. Realistic Evaluation: An Overview. Br J Sociol. 2000; 49:331.

doi:10.2307/591330
4. Powell R. Evaluation Research: An Overview. Libr Trends. 2006;55(1):102-120.
5. Pawson R, Greenhalgh T, Harvey G, Walshe K. Realist Synthesis : An Introduction.

Manchester ESRC Res Methods Program. 2004;February.
6. Wong G, Greenhalgh T, Pawson R. What is realist review and what can it do for me? An
introduction to realist synthesis. Accessed 27 June 2018. Available at: https://
pram.mcgill.ca/i/Wong_G_JUNE09_what_is_a_realist_review_presentation.pdf
7. Pawson R, Greenhalgh T, Harvey G, Walshe K. Realist review-a new method of

systematic review designed for complex policy interventions. Journal of health services
research & policy. 2005 Jul;10(1_suppl):21-34.
8. Williams L, Burton C, Rycroft-Malone J. What works: a realist evaluation case study of

intermediaries in infection control practice. J Adv Nurs. 2013;69(4):915-926. doi:10.1111/
j.1365-2648.2012.06084.x
9. Greenhalgh T, Humphrey C, Hughes J, Macfarlane F, Butler C, Pawson RA. How do you

modernize a health service? A realist evaluation of whole-scale transformation in
London. The Milbank Quarterly. 2009 Jun;87(2):391-416.
10.Meier K. A realistic evaluation of a tool to assess the interpersonal skills of pre-

registration nursing students. Doctoral Thesis, 2012. Accessed on August 6, 2020.
Available at: https://openaccess.city.ac.uk/id/eprint/2097/1/Meier,_Katharine.pdf
11.Clark AM, MacIntyre PD, Cruickshank J. A critical realist approach to understanding

and evaluating heart health programmes. Heal An Interdiscip J Soc Study Heal Illn Med.
2007;11(4):513-539. doi:10.1177/1363459307080876
12.Chidarikire S, Cross M, Skinner I, Cleary M. Treatments for people living with

schizophrenia in Sub-Saharan Africa: an adapted realist review. Int Nurs Rev.
2018;65(1):78-92. doi:10.1111/inr.12391
13.Wong G, Greenhalgh T, Pawson R. Internet-based medical education: A realist review of

what works, for whom and in what circumstances. BMC Med Educ. 2010;10(1).
doi:10.1186/1472-6920-10-12
14.Sorinola OO, Thistlethwaite J, Davies D, Peile E. Realist evaluation of faculty
development for medical educators: What works for whom and why in the long-term.
Med Teach. 2017;39(4):422-429. doi:10.1080/0142159X.2017.1293238
15.Wong G, Greenhalgh T, Westhorp G, Pawson R. Realist methods in medical education

research: what are they and what can they contribute? Med Educ. 2012;46(1):89-96.
doi:10.1111/j.1365-2923.2011.04045.x
16.Keller C, Lindblad S, Gäre K, Edenius M. Designing for Complex Innovations in

H e a l t h C a r e : D e s i g n T h e o r y a n d R e a l i s t E v a l u a t i o n C o m b i n e d .
doi:10.1145/1555619.1555623
17.Bonell C, Fletcher A, Morton M, Lorenc T, Moore L. Realist randomised controlled

trials: A new approach to evaluating complex public health interventions. Soc Sci Med.
2012;75(12):2299-2306. doi:10.1016/J.SOCSCIMED.2012.08.032
18.Ellaway RH, O’Gorman L, Strasser R, et al. A critical hybrid realist-outcomes

systematic review of relationships between medical education programmes and
c o m m u n i t i e s : B E M E G u i d e N o . 3 5 . M e d Te a c h . 2 0 1 6 ; 3 8 ( 3 ) : 2 2 9 - 2 4 5 .
doi:10.3109/0142159X.2015.1112894
19.Neville AJ. Problem-based learning and medical education forty years on. A review of
its effects on knowledge and clinical performance. Med Princ Pract. 2009;18(1):1-9.
doi:10.1159/000163038
20.Cadogan M, Thoma B, Chan TM, Lin M. Free Open Access Meducation (FOAM): the
rise of emergency medicine and critical care blogs and podcasts (2002-2013). Emerg
Med J. 2014;31(e1):e76-7. doi:10.1136/emermed-2013-203502
21.Pawson R, Tilley N. Realistic Evaluation Bloodlines. Am J Eval. 2001;22(3):317-324.

doi:10.1177/109821400102200305
22.Greenhalgh T, Wong G, Westhorp G, Pawson R. Protocol - Realist and meta-narrative

evidence synthesis: Evolving Standards (RAMESES). BMC Med Res Methodol. 2011.
doi:10.1186/1471-2288-11-115
CH AP T E R 4
Mastery Learning
Authors: Michael Barrie, MD; Shawn Dowling, MD, FRCPC; Nicole Rocca, MD, FRCPC
A Case
As the residency program director in the ED at ALiEM Medical Center (AMC), you notice wide
variability amongst your resident-learners as to their performance on the annual in-training exam;
even amongst learners in the same post-graduate year. This problem is further magnified when Phil,
one of your highly regarded PGY3 residents, performs quite poorly during a practice oral exam – the
exam preceptor identified large gaps in Phil’s understanding of basic cardiac physiology and in the
use of vasopressors.
In your discussions with Phil, it becomes clear that he has a poor grasp on a number of fundamental
concepts within critical care management. He is aware of this and states: “I’ve been meaning to sit
down one day and study this stuff but residency is so busy.”
It is not clear to you why your current curriculum is sufficient for many of your residents to
demonstrate competency in the management of critically ill patients, but Phil, a smart and hard-
working resident, lags behind. As you review your didactic curriculum, you note that your learners
receive a basic cardiac physiology introductory lesson during intern orientation, and a few other
lectures led by a guest speaker (a cardiologist) who focused heavily on advance topics and
controversies in the literature. That foundation was sufficient for some learners to do well on the
exam, but not for Phil. You wonder if there could be a better curriculum design to ensure that all of
your learners achieve mastery on fundamental competencies prior to taking on more advanced topics
and lessons.
36
OVERVIEW | MASTERY LEARNING
The founding principle of Mastery Learning (ML) theory is that the majority of students
can attain a high level of achievement if provided proper instruction and sufficient time.
Benjamin Bloom, a founder and proponent of this theory, argued that when a cohort of
students with a normal distribution of aptitude are provided equal instruction within an
equal amount of time, a portion of students will not attain mastery of that subject. Within
that model, achievement is directly correlated with individual learner-aptitude.1
However, if each student is provided with as much time as he or she requires within a
lesson topic, then the majority of learners could be expected to achieve mastery.2 A central
thesis in ML is that “mastery” must be pre-defined (e.g. the criteria required to earn an
“A” grade). In addition, ML requires that teachers should provide formative assessments
at the end of each learning unit, as well as targeted corrective feedback for those who do
not attain the mastery level on the first attempt. Overall, the ML instructional theory
espouses the following principles:
1. Clear description and delineation of learning objectives within the curriculum.
2. Division of the lesson plan into discrete learning units, with sequential provision of the
content.
3. Instruction of each discrete unit for mastery. As such, all students are taught material in
a single unit with standard methods, and then examined for mastery of that unit.
Additional instruction is provided to students who have not achieved mastery, until
they meet the predefined standard.
4. Student evaluation reflects mastery of the curriculum as a whole, (rather than

achievement relative to classmates).2

• Benjamin Bloom
• James Block
• Robert Burns

• John B. Carroll
• Fred S. Keller
• Carleton Washburne
37
Background
Though ML theory gained popularity in the 1960s with Bloom’s work, its roots can be
traced to the work of Carleton Washburne in the 1920’s. Washburne was an educator in
Winnetka, Illinois and while there he developed the Winnetka Plan.3 The Winnetka Plan was
formulated in response to the extant elementary school grading system that expected all
learners to progress identically. The Winnetka Plan introduced a curriculum that taught the
‘common essential’ subjects (reading, writing, arithmetic) with individualized curricula for
students of different aptitudes. Students progressed to new content only after
demonstrating mastery of the level below. The locally implemented curriculum change saw
some dissemination, but only transiently.
Some time thereafter, John. B Carroll pioneered his own conceptual model of learning and
school development. Carroll was another thought leader who championed the idea that
most students can attain a certain criterion level within a subject when given enough time.4
He defined learning as a function of the actual time spent learning relative to the time
needed. Time spent learning is related to learner perseverance plus opportunity, whereas
the time they required for learning is related to a learner’s innate aptitude, the quality of
instruction and the learner’s comprehension of that instruction. Visually, the theory4 can be
represented as:
From there, ML theory was coined based on the work of thought leaders in two distinct
domains; Benjamin Bloom in education, and Fred Keller in the domain of psychology.
Bloom believed that knowledge acquisition need not correlate directly with learner
aptitude. He argued that learners should be provided individualized instruction for a
duration of time specific to their needs, so the majority of students could achieve mastery.
The ML model would specify that students can succeed within a curriculum regardless of
aptitude (i.e. the correlation between aptitude and learning would approach zero).1
Contemporary with Bloom’s work, Fred S. Keller gained publicity as theoretician in the
domain of psychology. Keller developed the Personalized System of Instruction (PSI), which
built upon B.F. Skinner’s work in operant conditioning, modifying that construct for
38
classroom application.5 PSI described an individualized, learner-paced approach that was
not easily applied to the conventional classroom setting, where multiple students learn
together within finite time frames. It was not until Keller and his colleagues revamped the
PSI system to facilitate teaching to multiple learners simultaneously, and adapted it to
include discrete learning units, that the theory gained traction in school curricula. The core
elements of the PSI strategy were described by Keller and Sherman as follows:
Defining Mastery
• Curriculum divided into teaching-learning units.
• Objectives for mastery defined for each unit.
Planning for Mastery
• Educational resources provided by educators within each unit.
• Feedback/corrective materials developed for students that require remediation.
Teaching for Mastery
• Student-specific rate of progress through teaching-learning units.
• Students take the mastery test when sufficiently prepared, and they move on to the
next unit only after they pass the mastery test. If they do not demonstrate mastery,
students use corrective materials to address learning gaps.
Grading for Mastery
• Criteria for mastery (i.e. performance on a final exam) is defined by educator policy,
and not on performance relative to peers.5
While the ML and PSI theories are grounded in different realms, they are similar in that
they are both founded on the principle that almost all students can master what they are
taught if they understand the learning objectives, are given enough time, appropriate
instruction for their specific baseline level, and corrective methods targeting areas of
difficulty.

Due in part to advancements in digital technology, principles of ML have been
incorporated into various educational programs in recent years. One example is the flipped
classroom didactic paradigm.6 In this model, educational content is pre-defined and posted
in a learning management system so that students may review this material ahead of a
39
lesson, working at their own pace. The flipped classroom model allows students to move
quickly through familiar content and spend more time on the material that they find
challenging. This model permits learners to engage in higher order discussions, or work
through advanced applications of the learning content when in combined lessons. The
flipped model is a modern paradigm based on the mastery learning approach.
Another modern example of ML is gamification or serious games in medical education.7

Gamification describes the use of gaming mechanics to encourage learners to engage in
learning as a means to learn new content. Meanwhile, serious games are games developed to
convey or teach an educational objective (rather than to be used for amusement alone). In
most iterations of gaming, students may work through educational program and objectives
at their own pace. The gamer can only advance to the next unit after achieving defined
mastery at the current level.
Many recent educational initiatives set forth by the Accreditation Council for Graduate
Medical Education (ACGME) or the Royal College of Physicians and Surgeons of Canada
(RCSPC) align with the ML theory of education.8,9 With the advent of competency based
education, with the implementation of the ACGME milestones and RCPSC entrustable
professional activities, national leaders in medical education are setting forth individual
standards that all learners must achieve to qualify for graduation.8 The milestone project in
particular is well suited to Mastery Learning model, as it has different levels that learners
should move through sequentially during training, defining criteria that one learner must
achieve to be promoted.
A competency-based education curriculum can incorporate the principles of ML, but given
the breadth and scope of materials covered in contemporary medical school curricula, a ML
approach to the entirety of medical school teaching is logistically challenging and likely
impractical. It may be more feasible to incorporate mastery learning into specific aspects of
a medical curriculum, such as procedure teaching and/or simulation. Here, tasks can be
clearly defined and assessed, with development of a timeline to allow learners to progress
at their own

ML based teaching will often require a significant time investment from both teacher and
learner.10,11 Be it for the development of other learning aids/materials to assist the slower
learners, or the extra time required for slower learners, ML-based education takes time.
And the time invested can be difficult to estimate in advance, as it will, by nature, be
specific and dependent to both the learner and the task. The logistics of delivering this
type of course can also be challenging, as the students who quickly complete a curricular
block will need enrichment activities to work on while others complete the remediation
materials.
40
The methods intrinsic to ML are well suited to procedural learning. Skills such as central
line insertion or thoracostomy are amenable to such instruction as they can be considered
in discrete, easily observable and correctable tasks.12 However, another criticism of ML
theory is that the focus on individual units of teaching may fragment learning and limit
comprehension of sophisticated concepts. Inui described this limitation with a ballet
analogy; a dancer can perfect each of number of individual ballet positions successfully, but
it does not guarantee integration of these basic positions into an artful performance of
Swan Lake.13 The author writes “Fragmentary assessment of individual tasks in mastery
learning could become a dead end in competency evaluation instead of serving as a
stepping stone to a more holistic assessment of key competencies in integrated processes of
care.”13 As such, a strict implementation of a ML-based lesson could hinder broad
contextual understanding in cases where a sophisticated example is necessary to
comprehend basic concepts.

In part to address knowledge deficits amongst some of your otherwise competent residents (like Phil),
you seek to create learning modules within a number of key topics in emergency medicine, each with
clear objectives and a structured progression. You do so by developing an asynchronous flipped-
classroom curriculum, where every learner may progress at his or her own pace – graduating to the
next module only after he or she has met predefined standards in the prior module. Resident-learners
are administered a pre-test to highlight the students that may need special attention. Faculty pre-
record lectures and provide teaching notes on core content. These include core objectives for each
curricular block. During conference, time is allocated for small group discussion of the block content.
At the end of each curricular block, students are assessed using simulation and multiple choice
questions to test core objectives. Residents that do not attain mastery are provided a remediation
assignment. Only those learners who have mastered the content are allowed ‘enrichment activities’
such as research opportunities, advanced study and related quality improvement projects.
Initially there was resistance to this curricular overhaul from faculty and residents. However, the
participants start to ‘see the light’ after experiencing the curriculum and realizing that it is student
focused with the goal of bringing everyone up to pre-defined expectations.
41
McGaghie WC. Mastery learning: it is time for medical education to join the 21st century.
Academic Medicine. 2015; 90(11); 1438-41.13
This manuscript offers a nice summary of ML theory within medical education. The authors argue that
the traditional medical education model developed by Sir William Osler (termed the natural method of
teaching) is a passive educational process, predicated largely on a learner’s cumulative patient-care
experience. This manuscript argues that the Osler model is inadequate, as it may leave some students
with knowledge deficits. Mastery learning is an educational paradigm that promotes excellent
performance from all learners, with minimal variation in measured outcomes amongst students. Unlike
traditional educational paradigms where lesson time is fixed and outcomes are variable, in the ML model,
the inverse is true. Requirements of a Mastery Learning model include – baseline testing, clear learning
objectives, engagement in educational activities, an explicit standard for passing, formative testing, and
sequential advancement of skills towards competency. This model is appropriate for both undergraduate
and post-graduate medical education trainees, and it helps ensure that all students are meeting the
milestones required.
Cook DA, Brydges R, Zendejas B, Hamstra SJ, Hatala R. Mastery learning for health
professionals using technology-enhanced simulation: a systematic review and meta-analysis.
Academic medicine. 2013; 88(8); 1178-86.14
Mastery Learning (ML) is the educational theory upon which Competency Based Medical Education
(CBME) is founded. It is well established that CBME requires individualized instruction. And ML
offers the ability to tailor education to learners’ needs. Given technological advances, simulation-based
education has been proposed to facilitate the incorporation of ML into CBME curriculum. The
quantitative outcomes of ML-based simulation education were reviewed in this study. The findings in
several studies found that overall, ML grounded simulation had a large positive effect on improving
clinical skills, and moderate effect on improved patient outcomes. The authors describe the significant
increase in time spent by both teacher and learner as a drawback to ML-based models. The time varied
based on the concept or skill taught with ML based simulation.
Guskey TR. Lessons of mastery learning. Educational leadership. 2010;68(2):52.15
This review summarizes key concepts of mastery learning, and explains how ML can be applied to other
related instructional models and interventions. The author defines a number of terms pertinent to
mastery learning. Educators must provide a diagnostic pre-assessment for learners, to define pre-
knowledge and describe knowledge and skills requisite prior to the lesson. Mastery learning encourages
high-quality, group-based initial instruction. This primary intervention should be able to adapt to the
context, to relate to students’ interests, and be specific to the students’ needs. For those that have
mastered the content, educators must also offer effective enrichment activities to provide challenging and
rewarding learning experiences for students who demonstrate mastery ahead of their peers – this is to be
material that does not broach upon the next curricular block. The author suggests that future studies
will focus less on the value of mastery learning itself, and more on improving processes of learning,
instructional materials, and the home learning environment.16
42
References
1. Bloom BS. Recent developments in mastery learning 1. Educ Psychol. 1973;10(2):53-57.

doi:10.1080/00461527309529091.
2. Block JH, Burns RB. Mastery Learning STOR. Vol 4.; 1976. http://blogs.edweek.org/
edweek/DigitalEducation/block_burns_1976.Mastery learning.pdf. Accessed December
17, 2018.
3. Le C, Wolfe RE, Steinberg A. I JOBS FOR THE FUTURE THE PAST AND THE PROMISE:
TODAY’S COMPETENCY EDUCATION MOVEMENT ACKNOWLEDGEMENTS.
https://www.luminafoundation.org/files/resources/the-past-the-promise.pdf.
Accessed December 17, 2018.
4. Carroll JB. The Carroll Model: A 25-Year Retrospective and Prospective View. Vol 18.
https://pdfs.semanticscholar.org/3e99/5718cd78dcd62ce3d06051e147750c0e65f0.pdf.
5. Keller FS (Fred S, Sherman JG (John G. PSI, the Keller Plan Handbook : Essays on a
Personalized System of Instruction. W.A. Benjamin; 1974.
6. Vogel L. Educators propose "flipping" medical training. Can Med Assoc J.

2012;184(12):E625-E626. doi:10.1503/cmaj.109-4212.
7. White EJ, Lewis JH, McCoy L. Gaming science innovations to integrate health systems
science into medical education and practice. Adv Med Educ Pract. 2018;Volume
9:407-414. doi:10.2147/AMEP.S137760.
8. Holmboe ES, Edgar L, Stan Hamstra C. The Milestones Guidebook. https://

www.acgme.org/Portals/0/MilestonesGuidebook.pdf?ver=2016-05-31-113245-103.
9. Sherbino J, Bandiera G, Doyle K, Frank JR, Holroyd BR, Jones G, Norum J, Snider C,
Magee K. The competency-based medical education evolution of Canadian emergency
medicine specialist training. Canadian Journal of Emergency Medicine. 2020
Jan;22(1):95-102.
10.Anderson LW. An empirical investigation of individual differences in time to learn. J

Educ Psychol. 1976;68(2):226-233. doi:10.1037/0022-0663.68.2.226.
11.Arlin M, Webster J. Time costs of mastery learning. J Educ Psychol. 1983;75(2):187-195.

doi:10.1037/0022-0663.75.2.187.
12.Barsuk JH, Cohen ER, Wayne DB, McGaghie WC, Yudkowsky R. A Comparison of
Approaches for Mastery Learning Standard Setting. Acad Med. 2018;93(7):1079-1084.
doi:10.1097/ACM.0000000000002182.
13.Inui TS. The Charismatic Journey of Mastery Learning. Acad Med. 2015;90(11):1442-1444.
doi:10.1097/ACM.0000000000000915.
14.McGaghie WC. Mastery Learning. Acad Med. 2015;90(11):1438-1441.doi:10.1097/

ACM.0000000000000911.
15.Cook DA, Brydges R, Zendejas B, Hamstra SJ, Hatala R. Mastery Learning for Health
Professionals Using Technology-Enhanced Simulation. Acad Med. 2013;88(8):1178-1186.
doi:10.1097/ACM.0b013e31829a365d.
16. Guskey TR. Lessons of mastery learning. Educational leadership. 2010;68(2):52.
44
CH APT E R 5
Cognitive Theory of Multimedia Learning
Authors: Laurie Mazurik, MD; Elissa Moore, DO; Megan Stobart-Gallagher, DO; Quinn Wicks, MD
Editor: Daniel W. Robinson, MD
A Case
Dr. Sunny Hargrave just had his annual end of year review with the department chairman. He has
received good evaluations from the majority of the residents complimenting his on shift and bedside
teaching methods. However, his evaluations for his module and grand rounds lectures are below
average when compared to his peers. The residents have commented that he often reads from his
slides and that his power points are hard to read with too many words and they do not get much out
of his presentations. His co-faculty have also commented that his presentations often lack images or
graphics. His chairman recommends several FOAM sites and blogs for advice on how create more
engaging presentations, but Dr. Sunny Hargrave is still struggling to better incorporate multimedia
into his talks. He just does not understand what he is reading.
He wondered about the following questions:
• What does it mean to use more images?
• What words should I use?
• How will the group get the core content if it is not written down on the slides?
• How will they retain anything from a presentation full of images?
45
OVERVIEW | COGNITIVE THEORY ON MULTIMEDIA LEARNING
Mayer’s cognitive theory on multimedia learning was developed to foster meaningful
learning or a deeper understanding of material presented. This theory is based on the
premise that students learn more from pictures and words (either spoken or printed),
than words alone due to the way the brain processes information. This theory is built on
three core principles1,2:
1. Dual Channel Principle: Our brain processes information across two channels
depending on how information is presented: with auditory or visual stimulation.
2. Limited Capacity Principle: The brain can be overwhelmed easily as no one person
has infinite space and/or memory so do not overwhelm learners with information. Our
brain will choose what to pay attention to.
3. Active Processing Principle: In order for learning to occur, our brain must convert
information from sensory memory to working memory, processing presented
information by creating mental models of information as it is presented.
This theory also describes the concept of potential areas to avoid that will create
“cognitive overload,” or when processing overwhelms the learner’s capacity. This can
significantly impede a learner’s ability to achieve meaningful learning, retention, and
ability to solve future problems based on what they were taught (transfer).1

•Richard E. Mayer
•Roxana Moreno

•Ruth Colvin Clark
Background
As discussed above, the original premise of this work was to help multimedia learning to
accomplish meaningful learning. Multimedia learning is defined as learning from words
and pictures, while multimedia instruction is presenting words and pictures in order to
teach. Meaningful learning is defined as achieving a deep understanding of material, while
organizing it into a structure one can understand and then go forward and integrate into
their already present core knowledge. Studies beyond Mayer’s original work, show that
46
long-term transfer and long-term retention occur more frequently when adhering to this
theory in a medical curriculum.2 To understand how multimedia instruction can be a useful
teaching modality, one must understand how the mind works first.1
As we revisit the 3 principles of Mayer’s theory: dual channel assumption, limited capacity
assumption, and the active processing principle we must first review how the brain
processes information. As described below in Figure 1, words and/or images are conveyed
via the dual channel assumption (visual or auditory). They then are taken into sensory
memory, where the brain selects things to first process using their working memory and
then finally create a mental framework or model of the information via active processing.
This can then be integrated with prior knowledge, applied to new concepts by testing
retention and become a long-term memory. Within this process are many areas where
cognitive overload could occur and since we all have limited capacity, this can be a delicate
balance of how many images/words, etc. to include when creating a multimedia
presentation. Cognitive overload in the form of interesting extraneous details not relevant
to the core material have shown decreased processing during learning.3
Figure 1: Cognitive theory of multimedia learning1
In order to create effective instruction, one must take time to figure out what needs to be
learned and to what level of application, as well as how learning will be measured -
essentially the HOW of teaching. First, the creation of instructional objectives or the WHY
of the lesson, followed by the WHAT including the meat of the subject matter to be taught,
followed by the HOW DID WE DO by measuring retention.
When developing instructional design, consider the above principles of how one learns,
how much they can learn/process as well as trying to avoid cognitive overload. This can be
accomplished by breaking down these further into1,4-6:
47
1. Ways to avoid extraneous processing
• Take out extraneous material, although it may be entertaining (Coherence principle)
• Highlight essential core concepts (Signalling principle)
• If using words, put them near their visual counterparts (Spatial Continuity)
• If using words, show them at the same time as their visual counterparts (Temporal
continuity)
2. Ways to manage essential processing
• Pre-train with key concepts
• Break down into smaller segments controlled by learner (segmental principle)
• Presents words in spoken form instead of written form
3. Ways to foster generative processing allowing for learner to organize material
• Words + Pictures > Words OR Pictures Alone, but do not need words, pictures, AND written
text.
• Conversational tone > Formal tone (ex. Use YOUR instead of THE when describing a
body system)
• Human voice > computerized voice
• Image principle - do not include your own image!
The italicised statement above is the core behind the Multimedia principle of instructional
design: learning from words and pictures together show higher retention and transfer of
material when compared to words or pictures alone.1,2,4-6

In a simple google search, there are endless websites, blogs, podcasts, books, etc. on how to
create a stellar presentation. You can just think about your favorite TED talks, or any Free
Open Access Medical Education (FOAMed) resources and see how the multimedia theories
have been applied (or not applied). Some of the most prolific applications of this occur in
formal presentation series such as Keynotable hosted by Haney Mallemat, MD or by
48
reading the “Presentation Zen” series by Garr Reynolds. You can also see this published
beyond medical education literature in business resources like the Harvard Business
Review where they highlight various approaches. Ross Fischer (@ffolliet) has also adapted
the P-cubed, or the three P’s of presenting as an off-shoot of parts of multimedia theory.
Where this is seen very commonly in Emergency Medicine education right now is within
segmented video resources coupled with podcasts/verbal discussions. Hippo EM is an
excellent example of this - where simple video images are played while verbal discussion
takes place. At the end of each video series, there are questions to test retention and
hopefully as you go forward through the videos coupled by body system, you can build on
previous knowledge.

In medical education, this obviously is a theory in which classroom learning can be built
upon, but not only that using a flipped classroom approach to present materials in a
multimedia module (presenting key concepts) ahead of time and then using this approach
for more detailed in person lectures verbalizing most of the information with highlighted
images can reaffirm those key concepts. This can be stretched from undergraduate medical
education to graduate medical education and beyond for faculty development.
In the clinical arena, patients benefit tremendously from visual stimuli when
understanding disease processes or plans. We do not always remember to use appropriate
layman terms despite literature instructing us to write discharge summaries at a 6th grade
reading level.7 If we begin using this modality in the clinical arena, with utilization of white
boards at the bedside, iPad and digital recorded instructions with representative drawings
we may be able to improve both patient satisfaction and potentially long term outcomes by
having better understanding.

The theory of multimedia learning itself is very basic and seems like an easy concept for
teaching: Words + Images are better than either alone. However, the limitation can lie in the
attempted execution of this theory. Creation of instructional objectives could be an entire
course upon itself, so in order to create objectives and then translate them into a useful
multimedia presentation to create meaningful learning requires time, patience, and a lot of
practice from module creators for both creation of material as well as execution if done in a
live setting.
Another potential limitation is that all learners have the potential for varying abilities to
process as well as different thresholds for cognitive overload. Although a lot of how we
learn has been debunked, as primarily auditory vs. tactile, etc., using the dual channel
49
process may work really well for some and not well for others based on how they have
adapted their processing over time. With adult learners, you may have some stuck in old
ways and their brains will have be rewired to process, learn and retain in this format. With
regards to overload, a lot of conversation may actual overload one learner while polite
conversational instruction may cause others to completely tune out. It may be a challenge
to use a blanket modality to instruct a large group of learners.
Mayer R. Applying the science of learning to medical education. Medical Education. 2010;
44:543-549.4
This article mimics much of the original theory, but specifically launches into a more easily
understandable principles of reducing extraneous processing, managing essential processing, and
fostering generative processing through some specific medical education examples. Prior to that, it
focuses on that of creating instructional objectives, which should serve as the basis and framework for the
creation of your educational content
Huang C. Designing high-quality interactive multimedia learning modules. Computerized

Medical Imaging and Graphics 29 (2005); 223-233.8
This really speaks to a generational approach of incorporating technology into education and as we see a
trend of students going on-line for more easily digestible information (i.e. FOAMed), it is vital to know
how this content can be created using researched methods. This paper describes best practice guidelines
for creational of educational multimedia design from concept to reality in a step-wise fashion:
1) Understanding problem and needs while creating goals.
2) Designing content.
3) Build interactivity into module for self use.
4) Test and evaluate.
5) Take feedback and redesign.
While not all educators will be building an interactive modular curriculum, the basics of design, creation,
and assessment with redesign cannot go ignored.
Clark R. Mayer R. E-Learning and the science of instruction: Proven guidelines for
consumers and designers of multimedia learning. 2011. Wiley and Sons.9
While this last suggestion is a text instead of a key paper, it has several chapters dedicated specifically to
teaching an instructor on how to create electronic educational resources including chapters on
multimedia learning focusing on using words and graphics in lieu of either alone. It also hosts several
chapters focusing on additional principles including that of personalization, redundancy, coherence,
contiguity principles, as well as segmenting lessons, etc. While specifically focusing on multimedia
learning, within the 23 pages it takes the reader on a journey through the background material as well as
recommendations to illustrate specific content types including topic maps to help create organization and
prompt the creation of mental mapping for learner processing.
50
Dr. Sunny Hargrave, having gained a better understanding of the dual channel principle has learned
that his learners would benefit the most adding pictures to his lectures so that he may facilitate the
construction of the mental framework needed to not only keep his learners engaged, but the retention
of meaningful concepts as well. By adding images to the spoken word of his lecture, he is able to
recruit an entire additional sensory modality and its subsequent neural power to boost the memory
formation of his learners. He now know that his learners retain concepts better with a combination of
words and pictures. By replacing the written walls of text on his previous lectures and replacing
them with imagery he is able to better engage his learners with his spoken word as their mental
energies will be spent absorbing his spoken word instead of reading words on the screen. Through his
understanding of the Limited Capacity Principle, he knows that large amounts of text are far more
likely to overwhelm his learners than help them follow along with his lecture.
51
References
1. Mayer RE, Moreno R. Nine ways to reduce cognitive load in multimedia learning. Educ
Psychol. 2003;38(1):43–52.
2. McGraw-Hill Education Blog. Richard Mayer’s Cognitive Theory of Multimedia

Learning | McGraw-Hill Education Canada. https://www.mheducation.ca/blog/
richard-mayers-cognitive-theory-of-multimedia-learning/. Accessed July 22, 2018.
3. Mayer RE, Griffith E, Jurkowitz IT, Rothman D. Increased interestingness of extraneous

details in a multimedia science presentation leads to decreased learning. J Exp Psychol
Appl. 2008;14(4):329.
4. Mayer RE. Applying the science of learning to medical education. Med Educ.
2010;44(6):543–549.
5. Mayer’s Theory of Multimedia Learning - YouTube. https://www.youtube.com/watch?

v=0aq2P0DZqEI. Published June 2, 2017. Accessed July 22, 2018.
6. Rahul Patwari, Robert Cooney. Multimedia Principles.; 2015. https://

www.youtube.com/watch?v=BcWSUnXz8kw. Accessed July 22, 2018.
7. Choudhry AJ, Baghdadi YM, Wagie AE, et al. Readability of discharge summaries: with
what level of information are we dismissing our patients? Am J Surg. 2016;211(3):631–
636.
8. Huang C. Designing high-quality interactive multimedia learning modules. Comput

Med Imaging Graph. 2005;29(2-3):223–233.
9. Clark RC, Mayer RE. E-Learning and the Science of Instruction: Proven Guidelines for
Consumers and Designers of Multimedia Learning. John Wiley & Sons; 2016.
CH AP T E R 6
Validity
Authors: Rebecca Shaw, MBBS; Carly Silvester, MBBS
Editor: Dimitrios Papanagnou, MD, MPH
A Case
Dr. Carmody was excited. As a junior faculty member, she was attending her first clinical
competency committee meeting for her residency program. She had so many ideas that could
potentially help improve the program. She knew everyone else on the committee was more
experienced than her; so she hoped her enthusiasm would compensate for her lack of formal training
in medical education.
The meeting was not progressing as she had expected. The written assessments and attestations from
faculty from earlier in the academic year had failed half of the first-year residents. Between the
committee members where avid debate and conversation about the validity and reliability of
assessment methods. She wasn’t sure what they meant by the terms assessment, program evaluation,
and scoring inferences. With all this new terminology Dr. Carmody felt herself contributing less and
less to the discussion.
At the end of the meeting, the program director assigned tasks for the next meeting. She heard her
name. “Dr. Carmody, it would be great if you could present your insights into the validity argument
behind our written assessment by faculty members as a tool in our program of assessment. Could
you have that ready for the next meeting?”
Dr. Carmody was stressed. Dr. Carmody was most concerned about the implications of the current
scores, as several residents would be faced with remediation and extended training time. However, in
her current program - written statements by faculty members were a cornerstone of resident
assessment at her shop. If they were questioning their validity… Was the whole program invalid?
She was confused by the term “validity argument”, and the task ahead of her seemed daunting. Was
she reviewing written comments? Or scores? Or both?
Where was she possibly going to start?
53
OVERVIEW | VALIDITY
Assessment is an integral part of medical education; and the validation of an assessment
is vital to its use. All assessments aim to facilitate defensible decisions about those being
assessed.1 To make these decisions, evidence needs to be evaluated in order to understand
the strengths and weaknesses of the assessment in question.
Validity and validation are two separate terms with distinct meanings. Validity refers to a
conceptual framework for interpreting evidence, whereas validation is the process of
collecting and interpreting evidence to support those decisions.1
The current Standards for Educational and Psychological Testing define validity as “the
degree to which evidence and theory support the interpretations of test scores for
proposed uses of tests.”2 It is not the assessment or test itself that is validated, but rather
the meaning of the test scores, as well as any implications resulting from them.3 The
implications and resulting outcomes are the most important inferences in the validity
argument.1
Validation evaluates the fundamental claims, assumptions, and inferences linking

assessment scores with their intended interpretations and uses.1 Validity is not a property
of the test itself, but rather refers to the use of a test for a particular purpose. To evaluate
the suitability of a test for a particular purpose requires multiple sources of evidence.4
Sufficient evidence is also required to defend the use of the test for that purpose.3 The
extent to which score meaning holds across population groups, settings, and contexts is
an ongoing question, and is the main reason that validation is a continuous process rather
than a static, one-time event. Rather than being dichotomous, validity is considered a
matter of degree.4

•Samuel Messick
•Michael Kane

•David Cook
54
Background
Validity theory has significantly evolved over time. Initially, validity was divided into three
main types:
• Content validity, which relates to the creation of the assessment items;
• Criterion validity, which refers to how well scores correlate with a reference-standard
measure of the same phenomenon;
• Construct validity, in which intangible attributes (i.e., constructs) are linked with
observable attributes based on a conception or theory of the construct.1
Messick’s Validity Framework
Educators, however, recognized that content validity nearly always supports the test, and
that identifying and validating a reference standard is very difficult.1 Over 20 years ago,
Messick proposed an alternative unified framework in which all validity is considered
construct validity and consists of evidence collected from 6 different aspects.3 These aspects
can be briefly described as follows:
1. The content aspect includes evidence of content relevance and evidence that assessment
content reflects the construct it is intended to measure.
2. The substantive aspect refers to theoretical and empirical analyses of the observed
consistencies in test responses. It evaluates the extent to which responses from
examinees or raters align with the intended construct.
3. The structural aspect evaluates how well the internal structure of the assessment
reflected in the scores is consistent with the construct domain selected. This can include
measures of reliability across assessment items or stations.
4. The generalizability aspect evaluates how efficiently score properties and interpretations
generalize to and across population groups, settings, and tasks.
5. The external aspect includes evidence of the statistical associations between assessment
scores and another measure with a specified theoretical relationship. This may include
criterion relevance, applied utility, and evidence from multi-trait, multi-method
comparisons. The relationship may be positive for measures of the same construct or
negligible for independent measures.
55
6. The consequential aspect appraises the actual and potential consequences of test use,
including the beneficial or harmful impact of the assessment itself and the decisions that
result. It is related to the issues of bias, fairness, and distributive justice.3
The current Standards for Educational and Psychological Testing place emphasis on five of
the sources of evidence proposed by Messick. These are content, response process, internal
structure, relations with other variables, and consequences evidence; the generalizability
aspect, however, is not included in the standards.1
Kane’s Validity Framework
In 2006, Kane proposed an alternative unifying approach, instead identifying four

inferences in the validity argument. These are:
1. Scoring, which refers to translating an observation into one or more scores;
2. Generalization, which involves using the score as a reflection of performance in a test

setting;
3. Extrapolation, which is using the scores as a reflection of real world performance;
4. Implications, which involves applying the scores to inform a decision or action.1
Kane stipulated that the implications and associated consequences are the most important
inferences in the validity argument. Evidence is needed to support each inference, and
should focus on the most questionable assumptions in the chain of inference. The
framework is versatile and can apply to all forms of assessment equally, including
quantitative or qualitative assessments, individual tests, and programs of assessment. This
framework can help educators identify which are the most important pieces of evidence
when planning an evaluation and identifying evidence gaps.1

The unified view of construct validity is widely endorsed; but there is ongoing controversy
about the definition of validity. Messick's definition incorporates both accuracy of score
inferences and consequential validity. It has been argued that this definition is too
complicated. Cizek proposed validation of score inferences and justification of test use
should be considered as two parallel, but equally valued, endeavors. Evaluation of the
technical test score inferences is considered separate to the evaluation of the justification of
test use which is a social value.5
56
Another alternative model proposed by Lissitz and Samuelsen in 2007 separates test
evaluation into internal and external aspects. In this model, validity mainly concerns itself
with the internal aspects of a test which can be studied in relative isolation from other tests.
The external aspects are then only validated if necessary. Validity is approached
independently of the testing context and purpose of the test. Instead the main focus is on
evaluating test content. This model therefore implies that validity is a property of the test
itself, as validation of the construct is separated from validation of the test itself.4
Finally, a broader conceptual framework may be considered, which analyzes assessment

procedures with varying degrees of detail along the continuum of micro-validation to
macro-validation. Macro-validation is concerned with the overarching validity claim
utilising a broad, holistic evaluation, but providing the least diagnostic Information. Micro-
validation on the other hand is concerned with the underpinning validity claims utilizing a
narrow, targeted evaluation, and providing the most diagnostic information. This
framework allows a different way of thinking about validation evidence by distinguishing
between types of inquiry which can be outcome-related or process-related. Evidence from
multiple sources, rather than just the five sources in the current standards, is considered
legitimate.6
High-stakes examinations are often the final summative hurdle in professional education.
The consequences of a pass or fail result is far reaching for both the doctor and the public.
Interest in the consequential validity and the process to determine scores are critical in
ensuring stakeholders view the results as credible.7
Programmatic and competency-based assessment have increasingly become the focus of

medical education. One of the tenets of competency-based medical education is that
training programs must use assessment tools that meet minimum standards of quality.8 As
a result, validity and validation are increasingly being studied. A thorough understanding
of the principles of validity and its measurement would be of benefit in many domains of
medical education.
One of the most validated and studied tools takes the form of direct observation of clinical
skills (i.e., the American Board of Internal Medicine Mini-CEX assessment of clinical
skills).9 Knowledge of its validity and robustness may lead to weighing its results more
heavily in summative assessments.
Another area where validity theory has increasing application is in medical simulation.
Simulation provides the opportunity for deliberate practice, and acts as a surrogate for
meaningful educational outcomes.10 Inherent in its design is the ability to control a large
57
amount of variables, enabling targeting of a specific topic or skill. This control allows the
application of a validity framework to ensure the designed instrument measures what is
purported.
Assessment of professionalism is often described as one of the more challenging aspects of

performance reviews. Consequently, many assessment tools have been developed to assess
professionalism. Having a good grasp of the principles of validation frameworks would
enable better evaluation of assessment tools11 and more accurate reflections.
Though validity is considered essential to quality educational assessment, there is no

consistency in terminology or agreed upon exemplar of good validation practices.13,14 It is
theorized that this resulted secondary to the rich fabric of practitioners contributing to
health education including psychology, sociology and education backgrounds.13
The complexity of Messick’s model had been widely criticized, along with a lack of
practical guidance.13 The sense that the task is insurmountable, along with a long list of
hypotheses and settings, has the consequence of allowing practitioners to consider even a
little bit evidence of whatever type sufficient,15 potentially resulting in the development of
suboptimal research validation programs.
Lastly the emphasis placed on morality and social consequence by Messick (less so by
Kane) is considered by some to be more of a judgement call, and therefore can lead to
confusion between fact and personal preference.16
58
Messick, S. (1995). Standards of Validity and the Validity of Standards in Performance
Assessment. Educational Measures Issues and Practice, 14 (4), pp. 5-8.
Considered to be the seminal article unifying the concepts of validity into a defined framework.
Kane, M. (2013). The Argument-Based Approach to Validation. School Psychology Review,

42(4), pp. 448-457.
Kane’s modern framework is explained in this paper. Covering a brief history of validity theory, he then
goes on to offer his version of a simplified, step-wise template for validation. The defined approach is to
state what is being claimed and evaluate the claims being made.
Cook, D., Brydges, R., Ginsburg, S. and Hatala, R. (2015). A contemporary approach to
validity arguments: a practical guide to Kane's framework. Medical Education, 49(6),
pp.560-575.
This key paper explains the utility of Kane’s framework in medical education, and highlights the use of
validation as a continuous process. It provides clarity to the argument that the purpose of validation is to
collect evidence that evaluates whether or not a decision and its attendant consequences are useful. The
core elements of Kane’s framework (i.e., scoring, generalization, extrapolation, and implications) are
explained in practical terms, along with examples of elements of evidence that may be used to test each.
Finally, it provides an example of the application of Kane’s framework to familiar testing tools including
assessment of procedural skills, and a qualitative (in-training narrative) assessment.
Downing, SM. (2003) Validity: on the meaningful interpretation of assessment data. Medical
Education, 37, pp.830-837.
Downing’s paper is a thorough explanation of construct validity specific to medical education assessment,
and closely reviews the five types of validity as outlined by the Standards.2 In order to enhance the
understanding of each of the validity sources, this paper has constructed example assessments, and
provides validity evidence for each. Viewing validity as closely aligned with the scientific method of
theory development, Downing provides a solid argument for validity as a marker for quality.
Downing, SM. (2003) Validity: on the meaningful interpretation of assessment data. Medical
Downing’s paper is a thorough explanation of construct validity specific to medical education assessment,
and closely reviews the five types of validity as outlined by the Standards.2 In order to enhance the
understanding of each of the validity sources, this paper has constructed example assessments, and
provides validity evidence for each. Viewing validity as closely aligned with the scientific method of
theory development, Downing provides a solid argument for validity as a marker for quality.
59
“Thank goodness for Kane’s framework,” Dr. Carmody thought at the end of the clinical competency
committee. She had managed to explain the concept of validity well, and identified some fixable
problems in their current written assessment tool.
In reviewing resident evaluations and scores, she noted that there were limited details about observed
behaviors by faculty, reflecting poor question construction. She also noted that evaluations were
weighed more heavily when completed by non-clinical faculty, and wondered if this parameter
introduced a bias.
Dr. Carmody found that written assessments were incongruent with patient feedback forms, but
were consistent with in-service training examination scores. She began to wonder what this meant
for extrapolation.
Using this framework, she had put forward some clear suggestions to improve the validity of the
current written evaluation model within the residency. The meeting progressed with further
discussion of how the assessment process for the residents could be improved. She was happy that the
resident evaluations were being reconsidered, and felt optimistic that suggested improvements would
make the evaluation system a more useful tool in the future for her residents.
References
1. Cook, D., Brydges, R., Ginsburg, S. and Hatala, R. (2015). A contemporary approach to validity
arguments: a practical guide to Kane's framework. Medical Education, 49(6), pp.560-575.
2. American Educational Research Association, American Psychological Association, and

National Council on Measurement in Eduction. Standards for Educational and Psychological
Testing. Washington, DC, 2014, p11.
3. Messick, S. (1995). Standards of Validity and the Validity of Standards in Performance

Assessment. Educational Measures Issues and Practice, 14 (4), pp. 5-8.
4. Sireci, S. (2007). On Validity Theory and Test Validation. Educational Researcher, 36(8),
pp.477-481.
5. Cizek, G. (2012). Defining and distinguishing validity: Interpretations of score meaning and
justifications of test use. Psychological Methods, 17(1), pp.31-43.
6. Newton, Paul, E. (2016). Macro-and Micro Validation: Beyond the “Five Sources” Framework
for Classifying Validation Evidence and Analysis. Practical Assessment, Research &
Evaluation, 21 (12). Available online: http://pareonline.net/getvn.asp?v=21&n=12
7. Downing, SM. (2003) Validity: on the meaningful interpretation of assessment data. Medical
8. Holmboe ES, Sherbino J, Long DM, Swing SR, Frank JR & for the International CBME
Collaborators (2010). The role of assessment in competency-based medical education. Medical
Teacher, 32(8), pp.676-682.
9. Cook, D., Hatala R. (2016). Validation of educational assessments: a primer for simulation and
beyond. Advances in simulation. 1(31)
10.Kogan, JR., Holmboe, ES., Hauer KE. (2009) Tools for direct observation and assessment of
clinical skills of medical trainees. JAMA. 302(12) pp.1316-1326.
11.Clauser, B.E., Margolis, M.J., Holtman, M.C. (2012) Validity considerations in the assessment of
professionalism. (2012). Advances in Health Science Education. 17(2). 165-181.
12.Kane, M. (2013). The Argument-Based Approach to Validation. School Psychology Review,

42(4), pp. 448-457.
13.St-Onge, C., Young, M., Eva, KW., Hodges B. (2017). Validity: one word with a plurality of
meanings. Advances in Heath Science Education. 22. pp. 853-867.
14.Royal KD. (2017). Four tenets of modern validity theory for medical education assessment and
evaluation. Advances in medical education and practice. 8. Pp. 567-570.
15.Shepard LA. (1993) Evaluating test validity. Review of research in education. 19. pp. 405-450.
16.Lees-Haley PR. (1996) Alice in validtyland, or the dangerous consequences of consequential

validity. American Psychologist, 51(9), pp 981-983.
61
CH AP T E R 7
Programmatic Assessment
Authors: Elizabeth Dubey, MD; Christian Jones, MD; Annahieta Kalantari, DO
Editor: Sara M. Krzyzaniak, MD
A Case
Sonja excelled in medical school. She flew through her nonclinical years, earned honors on her
rotations, and got outstanding letters of recommendation from her mentors.
Intern year was… different.
Sonja struggled to find her way in the busy urban hospital where she trained. Her patient care
dedication never lapsed, but she constantly felt she was barely keeping up with the endless flow of
test results and pending procedures. Her interactions with senior residents and attending physicians
were all business: here’s what needs to be done, let me know when it is complete. This was mirrored
in her regular summative evaluations: “Doing fine”, “Keep reading”, and “Hard worker” were the
norm.
When her program director (PD) met with Sonja six months into internship, feedback was limited.
“I haven’t heard about any major problems,” the PD noted. “How have you been doing?”
The young doctor was reluctant to discuss her uncertainties, and stuck with “Fine, I think.” The two
made plans to discuss her progress again after the annual inservice exam.
That, unfortunately, did not go well. Sonja was shocked to find she scored in the 14th percentile on
the test and was already expecting the PD’s call. More shock was to come, however; much to her
surprise, Sonja was placed on academic probation. She’d be assigned a mentor and a “learning
specialist” with whom she was required to meet every other week. If she continued to perform “this
poorly”, she was told, she’d be asked to leave the program. The PD asked her if she had considered
another specialty.
62
OVERVIEW | PROGRAMMATIC ASSESSMENT
Programmatic assessment utilizes data points from various sources and multiple
assessment tools in order to make high-stakes decisions and facilitate learning.
We first must make the distinction between assessment of learning and assessment for
learning. Assessment of learning is the traditional summative assessment which is
familiar to all of us. This may take the form of a grade or formal report card. Assessment
for learning combines the assessment process with the educational process, allowing
education to be tailored to the needs of individual students in an ongoing fashion.1 The
goal is to make assessment an integral and relevant component of education.
Programmatic assessment considers assessment to be as important as the curriculum
itself, thus requiring intense planning and review. This type of an assessment program
can allow educators to use the assessment as a teaching tool in itself.
A program of assessment is used to collect and combine information from various

assessment sources to inform about the strengths and weaknesses of each individual
learner. An important part of innovative assessment programs is that information from all
assessment sources can be used to inform each competency domain. The deficiencies of
one instrument can be compensated for by the strengths of another instrument, leading to
a diverse spectrum of complementary measurement tools to understand competence as a
whole.
When using programmatic assessment to review a learner, individual data points,

garnered from individual assessments, are maximized for learning and feedback value. In
contrast, high-stakes decisions on a learner’s overall competency are based on the
aggregation of many data points. Thus, no high-stakes decisions are made without a
detailed collection of information that is supported by thorough measures to ensure their
reliability.

•Cees P. M. van der Vleuten
•Lambert W. T. Schuwirth
Other authors in this area:

•Roger Ellis
•Joost Dijkstra
63
Background
The idea of assessment for learning is not new. It was proposed in 1989 by Martinez &
Lipson.1 Their interpretation was limited to increased frequency of standardized testing
and the use of more feedback, but their views demonstrated a growing awareness of
assessment as an important aspect of education.
Over the 20th century, the behavioristic approach to learning prevailed in education, with
the belief that competency was achieved after multiple small steps were mastered. In this
traditional construct, the onus is on the learner to pass a module or test. If he fails, he is
remediated and until he passes. In this way, assessment is viewed as a checklist. This
worked well in a mastery learning view of education theory but now shares the spotlight
with newer theories.
Modern education builds on constructivist retested learning theories in which learners

create their own knowledge and skills through integrated programs that guide and support
competence. This new way of understanding education allows educators to take a fresh
look at assessment, and in 2005, van der Vleuten & Schuwirth proposed the notion of
programmatic assessment in medical education.2
Programmatic assessment uses traditional assessment instruments to augment a more

modern approach. In this context, each specific assessment is chosen to combine with
others to form a robust program catered to the learner. This helps mitigate limitations in a
single assessment as the combination aims to create an overall thorough assessment
program.
Soon after the inception of programmatic assessment, Dijkstra published a set of program
design guidelines to help make this idea more practical.3 However, as these guidelines
were relatively generic, van der Vleuten proposed an integrated model for programmatic
assessment that optimized both the learning function and the decision function in
competency-based educational contexts.4 This integrated model is specific to constructivist
learning programs.

Programmatic assessment works with the concepts of constructivist learning and
longitudinal competence development. There is strong emphasis on using feedback to
64
optimize individual learning. There is also a focus on tailoring remediation to the
individual student. For many educational programs, this is a radical change.
The Accreditation Council for Graduate Medical Education (ACGME) uses a set of 6 core
competencies to help define the foundational skills every practicing physician should
possess. These include practice-based learning and improvement, patient care and
procedural skills, systems-based practice, medical knowledge, interpersonal and
communication skills, and professionalism.5 Most residency assessment programs use
instruments to evaluate the ACGME competencies with the assumption that the
competencies are stable, generic and independent of one another. As a result, assessment
innovations have previously been via the development of new instruments to directly
measure one of these constructs.1 This narrow view has many limitations in the potential
for assessment to garner valuable information. For example, if in one assessment
instrument, competence is acquired by completing a specific set of steps, assessment results
must indicate whether or not a step was finished successfully.
Some traditional assessments such as multiple-choice exams are often designed to

eliminate information and come to a dichotomous decision. Using the “pass-fail” multiple
choice exam as an example, valuable assessment data is discarded along the way; the
information about the answers not chosen by the learner, the specific questions that were
answered correctly versus those answered incorrectly, and even percentage correct.
Aiming to pass examinations such as this can lead to poor learning habits in learners. It can
also encourage a “grade culture”, where achieving the highest grade is the main objective.5
Many assessment programs pursue objectivity over subjectivity as it is easier to summarize
and compare objective information, but choosing to ignore the details of well-gathered
subjective evaluations discards the great value of this subjective information.
Although programmatic assessment is well received in educational practice, many find

programmatic assessment complex and overly theoretical, and many training programs
have yet to develop this type of assessment program. In response, van der Vleuten
published a paper in 2015 that provides concrete steps to implement programmatic
assessment in an educational program:6
1. Develop a master plan6
An overarching structure must be chosen, usually in the form of a competency framework.

Here, assessments are taken as single data points with the development of a continuum of
stakes, ranging from low- to high-stakes decisions. Depending on the curriculum and the
phase of study, the master plan will contain a variety of assessments, a mixture of
standardized and non-standardized methods, and the inclusion of modular as well as
longitudinal assessment elements.
65
2. Develop examination regulations that promote feedback orientation6
Pass-fail decisions should not be made on the basis of individual data points. Avoid
connecting credits to individual assessments as this raises their stakes, causing learners to
focus on passing a test over quality feedback and follow-up. In all communication, and in
examination regulations, the low-stakes nature of individual assessment should be
apparent.
2. Adopt a robust system for collecting information6
Create electronic student portfolios. This would serve to 1) provide a storehouse for
feedback including assessment feedback, activity reports, learning outcome products and
reflective reports, 2) facilitate the administrative and logistical aspects of assessment, and 3)
allow for quick synopsis of gathered information. This would be easily accessible from
anywhere and adjusted to fit the needs of the assessment program.
3. Assure every low-stakes assessment provides meaningful feedback for learning6
Meaningful feedback has many forms, including reviewing a multiple-choice test, score
reports from standardized tests, skills domains or longitudinal overview for progress test
results. Obtain effective feedback from teachers.
4. Provide mentoring to learners6
Feedback should be part of a reflective dialogue, and mentoring is an effective way to set
this up.
5. Ensure trustworthy decision-making6
High-stakes decisions should be entrusted to individuals with good professional

judgement and procedural measures should be placed to ensure this.
6. Organize intermediate decision-making assessments6
Intermediate (i.e. mid-course) decisions add credibility to high-stakes decisions and keep
the learner in the loop on potential future high-stakes decisions. Intermediate assessments
are based on fewer data points than final decisions. Their stakes are in between low-stake
and high-stake decisions. Intermediate assessments are diagnostic (how is the learner
doing?), therapeutic (what might be done to improve further?), and prognostic (what might
happen to the learner if the current development continues to the point of the high-stake
decision?)
7. Encourage and facilitate personalized remediation6
66
This should result from an ongoing reflective process and is always personalized. The
curriculum should be flexible to accommodate the successful remediation of a struggling
learner .
Monitor and evaluate the learning effect of the program and adapt
8. Use the assessment process information for curriculum evaluation6
Assessment should promote learning as well as determine whether learning outcomes have
been achieved, to evaluate the curriculum.
9. Promote continuous interaction between the stakeholders6
Important in any educational system, it is imperative that those who design the system
should regularly seek input from stakeholders. This should occur not only in the initial
design phase, but also throughout the system’s existence - fostering continuous quality
improvement for the assessment system.
10. Develop a strategy for implementation6
Programmatic assessment requires a culture change and isn’t easy to achieve in an existing
educational practice.5 Programs that have implemented a structured assessment program
have found it more useful in identifying deficits and helping to create early intervention
programs.6,7

Programmatic assessment depends greatly upon feedback, multiple assessments that are
individually low-stakes, and an environment conducive to constructivist learning. This
makes programmatic assessment ideal for clinical training; though clinical decisions are
much more high-stakes than those on any written examination, thorough supervision and
mentoring allow ongoing subjective assessments of trainees while ensuring patient safety.
In addition to the culture shift toward myriad low-stakes subjective evaluations required
for programmatic assessment to be successful, it is easy to imagine that intense assessment
of learners in a trustworthy and fair process requires a great deal of time and effort. Indeed,
it is likely that many such programs have failed to meet the burdens of true programmatic
assessment for this reason. Programs with many learners for a short period of time---for
instance, medical student clinical clerkships---may have great difficulty implementing
programmatic assessment for their curriculum. Though still possible, this demands
significant training of faculty to provide meaningful feedback, dedication of assessors to
collect and make sense of learning portfolios, and availability of technology to help more
67
Schuwirth LW, van der Vleuten CP. Programmatic assessment: From assessment of learning
to assessment for learning. Med Teach. 2011;33(6):478-485.
This paper is an early publication about programmatic assessment. The authors describe the theory and
its purpose and potential impact in education.
van der Vleuten CP, Schuwirth LW, Driessen EW, Govaerts MJ, Heeneman S. 12 Tips for
programmatic assessment. Med Teach. 2014:1-6.
Programmatic assessment-for-learning can be applied to any part of the training continuum, provided
that the underlying learning conception is constructivist. This paper provides concrete recommendations
for implementation of programmatic assessment.
Ellis R. Programmatic Assessment: A Paradigm Shift in Medical Education. All Ireland

Journal of Teaching and Learning in Higher Education (AISHE-J). 2016;8(3).
The authors explain programmatic assessment in a more understandable way and discuss how it has
shifted the relationship between curriculum and assessment.
Perry M, Linn A, Munzer BW, et al. Programmatic Assessment in Emergency Medicine:

Implementation of Best Practices. J Grad Med Educ. 2018;10(1):84-90.
This paper describes best practices on how to implement an assessment program in Emergency Medicine.
Chan T, Sherbino J, McMAP Collaborators. The McMaster Modular Assessment Program

(McMAP): a theoretically grounded work-based assessment system for an emergency medi-
cine residency program. Academic Medicine. 2015 Jul 1;90(7):900-5.
Li SA, Sherbino J, Chan TM. McMaster Modular Assessment Program (McMAP) through the
years: Residents' experience with an evolving feedback culture over a 3-year period. AEM
Education and Training. 2017 Jan;1(1):5-14.
Acai A, Li SA, Sherbino J, Chan TM. Attending emergency physicians’ perceptions of a pro-
grammatic workplace-based assessment system: the McMaster Modular Assessment Pro-
gram (McMAP). Teaching and learning in medicine. 2019 Aug 8;31(4):434-44.
This series of papers highlights one specific emergency medicine program of assessment as well as the re-
actions of both resident physicians and faculty members within the system. Of note, the full library of all
the McMAP assessment tools are freely available via the ALiEM Library (aliem.com/library). This group
has continued on to publish more research as well, so be sure to check out some of their other works too.
68
easily combine the individual evaluations into a body of work to provide accurate and
reliable assessments.
Building a program of assessment from the ground up is no easy task. It requires multiple
active participants with in-depth knowledge of medical education and the evaluation
process. It is also essential that the necessary steps are done adequately so that the creation
of a sustainable infrastructure can withstand program dynamism.
With such a multifaceted program of assessment, it can be a difficult beast to maintain.

Extreme organization and oversight from the program is needed for its success. Much of
this responsibility lies on the leading educators, who are likely the program director or
associate program directors in a residency program. Yet given the necessary time
commitment, responsibilities will likely have to be delegated, and this is where an
unraveling of organization may occur. With good leadership, this is an unlikely outcome
but one that should be noted.
A program of assessment depends heavily on the judgement of professionals. Much of this

happens at the low-stakes level of individual feedback. These single feedback points are
combined qualitatively and direct the higher-stakes decisions of learners. With the
fallibility of the educator, a program runs the risk of weighing their assessment heavily on
biased feedback. Some of this can be mitigated by training and having multiple data points
from which to draw conclusions, but is difficult to completely eradicate.

The program has a fairly robust assessment program and the program director is surprised to find
that Sonja feels blindsided by her remediation.
Reflecting back on their practice and on the literature around programmatic assessment, they
identify that they missed a crucial step in setting up their program of assessment. While they’ve
done well managing decisions that are intermediate-stakes (such as whether to put a resident on
probation) or high-stakes (like whether to promote a resident to the next year), they have failed to
establish a healthy system for low-stakes assessments due to inadequate and infrequent feedback.
Without this, the higher-stakes decisions are of little value. Additionally, the program recognizes the
lack of an evaluation process for Miller’s pyramid level of ”knows”. Sonja’s probation was the result
of one data point from a mastery test. Had she received more frequent low-stakes assessment of her
medical knowledge, her poor ITE score may have been avoided.
The program sees the flaws in their feedback system and decides to train their faculty in quality
feedback, with the expectation that trainers give on-shift and post-procedure feedback both verbally
and through an online evaluation that becomes part of the resident’s e-portfolio. Additionally, the
program dedicates resources to develop its faculty in bedside teaching methods such as the Socratic
method to allow for more frequent assessments of learner foundation knowledge.
The program director assigns Sonja a mentor. Together, they review the analysis of her ITE scores
and reflect on why she scored low. From this they create a personalized and detailed study plan. The
mentor frequently “checks in” with Sonja to make sure she is on track and help facilitate her
progression. Sonja knows her clinical and academic weaknesses and has a plan for growth. She now
feels the program is invested in her and her success.
References
1. Schuwirth LW, van der Vleuten CP. Programmatic assessment: From assessment of learning to
assessment for learning. Med Teach. 2011;33(6):478-485.
2. van der Vleuten CP, Schuwirth LW. Assessing professional competence: from methods to
programmes. Med Educ. 2005;39(3):309-317.
3. Dijkstra J, Galbraith R, Hodges BD, et al. Expert validation of fit-for-purpose guidelines for
designing programmes of assessment. BMC Med Educ. 2012;12:20.
4. van der Vleuten CP, Schuwirth LW, Driessen EW, et al. A model for programmatic assessment
fit for purpose. Med Teach. 2012;34(3):205-214.
5. Holmboe, E.S., Edgar, L., & Hamstra, C.S. The Milestones Guidebook. American Council of
Graduate Medicine: 2016.
6. van der Vleuten CP, Schuwirth LW, Driessen EW, Govaerts MJ, Heeneman S. 12 Tips for
programmatic assessment. Med Teach. 2014:1-6.
7. Ellis R. Programmatic Assessment: A Paradigm Shift in Medical Education. All Ireland Journal
of Teaching and Learning in Higher Education (AISHE-J). 2016;8(3).
8. Hauff SR, Hopson LR, Losman E, et al. Programmatic assessment of level 1 milestones in
incoming interns. Acad Emerg Med. 2014;21(6):694-698.
9. Perry M, Linn A, Munzer BW, et al. Programmatic Assessment in Emergency Medicine:

Implementation of Best Practices. J Grad Med Educ. 2018;10(1):84-90.
70
CH AP T E R 8
Self-Assessment Seeking
Authors: Nilantha Lenora, MD; Layla Abubshait, MD; Manu Ayyan, MBBS
Editor: Benjamin H. Schnapp, MD MEd
A Case
Adam is a PGY1 in a four year Emergency Medicine (EM) residency program. He ranked near the
bottom of his class in medical school and had below average USMLE scores. Many of his recent shift
evaluations have expressed concerns about his clinical knowledge and skills. Adam rationalized
these poor shift evaluations as flawed with a variety of justifications, including having difficult pa-
tients who were “poor historians,” feeling sleep deprived, and just plain “bad luck” with unusual
presentations of diseases. He was called to attend his first meeting with his program director (PD)
six months into his intern year and asked to complete a self-evaluation prior to the meeting.
Adam enjoys his current EM training experience. Adam feels extremely confident in his medical
knowledge, procedural skills, and ED workups. He also feels he is above average and one of the top
residents in his class. He sees a high number of patients per hour and is proactive in seeking proce-
dures. In accordance with this positive self-outlook on his abilities, he feels his self-assessment will
be more accurate than faculty evaluations and should describe his many strengths as a resident. As
such, he filled out his self-assessment form with a very positive description of his abilities.
At his meeting, the PD told him that his self-assessment is very different from the faculty evalua-
tions of him. Specifically, faculty evaluations mirrored his shift evaluations and expressed concern
regarding his below average clinical knowledge and skills.
Adam is extremely perplexed at the discrepancy and tells his PD: “But how? I don’t understand. I
know I’m better than my peers and I feel I’m one of the top interns in my class!”
71
OVERVIEW | SELF-ASSESSMENT
Eva and Regehr define traditional self-assessment as a “personal, unguided reflection on
performance for the purposes of generating an individually-derived summary of one’s
own level of knowledge, skill, and understanding in a particular area.”1 Individuals
themselves are the source of information and look inward to generate an assessment of
their own knowledge and abilities. Self-assessment can be further categorized into three
different perspectives:2
● Summative (assessing an overall performance or one’s abilities in general)

● Predictive (assessing one’s ability to perform in new situations)
● Concurrent (assessing ongoing performance while conducting an activity)
Much of the health professions literature has adhered to this traditional definition when
referring to and studying self-assessment. Traditional self-assessment can take on many
forms including formal self-assessment questionnaires, checklists, journal or diary entries,
patient chart reviews, or reviews of videotaped performances as common examples.
Within education theory, self-assessment is key to diagnosing one’s learning needs, an

early step in the process of self-directed learning as described by Knowles.3 Self-assess-
ment can also complement learning, as suggested by constructivism theory, with a self-
assessment of existing knowledge serving as a foundation for new information that can
be built upon and as an aid in integrating new knowledge.4
Additionally, professional societies rely on accurate self-assessment as an essential

premise for self-regulation in health professionals.1 An individual is expected to take re-
sponsibility and assess their own knowledge base and clinical practice, identify areas for
improvement, pursue educational opportunities addressing these areas, and then put this
new knowledge into action as performance improvement.5 Similarly, self-assessment has
been incorporated into undergraduate and graduate medical education,6 continuous

•Kevin Eva
•Glenn Regehr
Other important authors/works in this area:

•Boud D. Avoiding the traps: seeking good practice in the use of self-assessment and
reflection in professional courses. Soc Work Educ. 1999;18(2):121-132.
•Boud D. Enhancing Learning Through Self-Assessment. Routledge; 1995.
•Sargeant J, Mann K, van der Vleuten C, Metsemakers J. “Directed” self-assessment:
Practice and feedback within a social context: J Contin Educ Health Prof.
2008;28(1):47-54.
•Sargeant J, Armson H, Chesluk B, et al. The processes and dimensions of informed
self-assessment: a conceptual model. Acad Med. 2010;85(7):1212-1220.
•Kruger J, Dunning D. Unskilled and unaware of it: how difficulties in recognizing
one’s own incompetence lead to inflated self-assessments. J Pers Soc Psychol.
1999;77(6):1121-1134.
Background
Intuitively, self-assessments should have high fidelity and be the most accurate representa-
tion of our current knowledge, skills, and abilities, since we have more information about
ourselves than those external to us.9 The cumulative literature on self-assessment tells a
different story however, one that shows the accuracy of traditional self-assessment to be
quite poor when compared to external standards.
In his review of the self-assessment literature in medical education from 1970 to 1991, Gor-
don found that the validity of self-assessment was low to moderate when compared with
external criteria. He also theorized that it is a skill that can be improved and should be in-
tegrated with other sources of evaluative data to improve its validity and accuracy.10
Davis, et al. arrived at a similar conclusion, finding physician self-assessment inaccurate
when compared with external observations as a reference standard.11
A possible explanation for the inaccuracies in self-assessment arose from landmark research
conducted by Kruger and Dunning.12 In a series of experiments, they demonstrated that
the skills required to perform well in a given area are the same skills needed to accurately
assess one’s performance in that area. In other words, by lacking the requisite skills that
define competence, the “unskilled” are unable to identify the presence or absence of those
same skills in themselves, making them the least accurate when it comes to self-assessment.
Other papers have replicated similar findings, highlighting a flaw in using self-assessment
alone as the sole determinant for self-directed learning efforts.13
Around the same time, Boud offered new perspective on the traditional notion of self-as-
sessment. Instead of relying on the individual alone to conduct a self-assessment, Boud
proposed that self-assessment should not be an unguided task left to the individual by
themselves. He suggested that self-assessment should also incorporate external information
from peers, instructors, and other sources of information outside the individual to guide
and add validity to the self-assessment process.14,15
Eva and Regehr dubbed Boud’s conception of self-assessment “self-directed assessment

seeking,” defined as the “explicitly seeking external sources of information for formative
73
and summative assessments of one’s current level of performance and practice improve-
ment”.1
Based on consistent evidence demonstrating that traditional self-assessment has many limi-
tations, Eva and Regehr suggest that self-assessments created after seeking feedback and
incorporating information from external sources are more valuable for directing perfor-
mance improvement than relying on the individual’s assessment alone.1

Sargeant et al. proposed a conceptual model of informed self-assessment with three main
components. These components include:
1) Gathering both internal and external sources of information

2) Integrating the two sources of information with the help of a facilitator
3) Responding to the information - either rejecting or accepting it and applying the data
towards performance improvement16
Different types of external data and strategies exist to inform the modern notion of self-as-
sessment and overcome the cognitive biases limiting its accuracy.
External feedback can inform, add value, and improve the accuracy of self assessment once
incorporated with it. Learners should be actively seeking feedback from senior colleagues,
supervisors, program directors, faculty, and peers to inform their self-assessment. Eva and
Regehr argue that we should focus our efforts towards obtaining trustworthy feedback,
improving our abilities to act on this feedback without feeling threatened, and teaching
others how to share feedback in a way that will improve it’s acceptance by the recipients.1
Multi-source feedback is feedback compiled from different types of individuals observing

and offering assessments on different aspects of a learner’s performance. It is extremely
useful for areas that are difficult to gauge via self-assessment, including dimensions such as
professionalism, communication, and interpersonal relationships.17 For example, to assess
resident professionalism, a PD might obtain data from nurses, techs, and physicians from
other specialties. The aggregation of several sources improves the validity of the informa-
tion obtained.
Validation of self-assessment using external standards may also be a valuable tool. Clinical
guidelines, consensus-based performance standards, and benchmarking with physicians of
similar practice profiles can serve as a “reality check” to improve the effectiveness of self-
assessments as well.18
74
Guided reflection with the help of a trusted facilitator or mentor can also function as a
bridge between one’s self-perception and external sources of feedback, even serving to rec-
oncile the two when there is discordance between them.19

In post-graduate medical education, the ACGME core competency of Practice-Based Learn-
ing and Improvement requires residents to demonstrate self-directed learning.20 Self-as-
sessment can play a role in this by identifying a resident’s professional strengths and
weaknesses in order to guide such self-directed learning efforts.
The six ACGME core competencies21 or ACGME Milestones22 themselves can serve as the
organizing structure for a resident’s self-assessment. Each could serve as an accepted ex-
ternal standard to improve the validity of their self-assessment. A trusted faculty advisor
can then add more accuracy to the assessment by incorporating external data such as multi-
source feedback and faculty evaluations via guided reflection with the resident.
An example of self-assessment applied within the clinical realm exists with the Mainte-
nance of Certification (MOC) program for the Royal College of Physicians and Surgeons in
Canada. The MOC program rewards physicians with continuing professional development
credits for self-assessment activities that assess clinical knowledge and performance against
objective measures. It also encourages reflection on performance not just at the individual
level but also with peers.8
Another example exists in the United Kingdom’s National Health Service concept of ‘ap-
praisal.’ Here, an external appraiser (i.e. a mentor or senior clinician) reviews an individual
physician’s self-assessment and portfolio with them and can also incorporate external data
such as multi-source feedback.11,23

Many poor performers believe at baseline that they are above average in comparison to
their peers, a superiority bias also known as the Lake Wobegon effect. Self-assessment abil-
ities can also vary depending on the context, with our abilities more accurate in subject ar-
eas we have skills in and less accurate in subjects we are unfamiliar with.2
Certain areas such as interpersonal skills, communication skills, and professionalism are
inherently difficult to accurately assess by oneself and are more conducive to assessment
via a method like multi-source feedback.17
Another limitation within self-assessment theory is the assumption that once a knowledge
or performance gap is brought to a learner’s attention they will readily accept this gap and
75
Kruger J, Dunning D. Unskilled and unaware of it: how difficulties in recognizing one’s
own incompetence lead to inflated self-assessments. J Perspect Soc Psychol 1999; 77:1121-34.12
This landmark paper used a series of experiments to highlight the limitations of self-assessment.
Individuals who performed poorly were also the least able to accurately self-assess their performance in
these experiments. Their findings suggest that knowledge and competence within a given area (or lack
thereof) also defines one’s ability to accurately assess competence in that area as well.
Eva KW, Regehr G. “I’ll never play professional football” and other fallacies of self-
assessment. J Contin Educ Health Prof. 2008 Winter;28(1):14-19.24
This article draws distinctions between self-assessment as an ability, self-directed assessment seeking as a
pedagogical strategy, and self-monitoring, which is a moment-by-moment awareness of the likelihood that
one maintains the skill and knowledge to act in a particular situation.
Regehr G, Eva K. Self-assessment, self-direction, and the self-regulating professional. Clin

Orthop Relat Res. 2006 Aug; 449:34-38.5
In this paper, the authors review the literature in adult education, medical education, and cognitive
psychology to highlight two critical flaws in self-assessment and self-regulation. They challenge the
assumption that the process of self-assessment as conceptualized in this model can lead to the
identification of gaps in skills or knowledge.
Sargeant J, Armson H, Chesluk B, et al. The processes and dimensions of informed self-
assessment: a conceptual model. Acad Med 2010; 85: 1212-20.16
This qualitative study of self-assessment used focus groups to create a multidimensional conceptual
model of informed self-assessment. This model integrates internal sources of information (i.e. an
individual self-assessment) with external sources (i.e. evaluation forms), and finishes with deliberate
reflection by learners on the integrated assessment with the guidance of a trusted mentor.
Colthart I, Bagnall G, Evans A, Allbutt H, Haig A, Illing J, McKinstry B. The effectiveness of

self-assessment on the identification of learner needs, learner activity, and impact on clinical
practice: BEME Guide No. 10. Medical Teacher 30(2): 124–45.25
This systematic review suggests that the accuracy of self-assessment can be enhanced by feedback,
particularly video and verbal, and by providing explicit assessment criteria and benchmarking guidance.
Self-assessment needs to be used as one tool amongst others to provide a more complete appraisal of
competence in healthcare practice.
be motivated to address it. Individuals can have several cognitive mechanisms functioning
as a “psychological immune system” serving to fight off, rationalize, and dismiss un-
76
favourable assessments while maintaining a positive and optimistic outlook on one’s abili-
ties, limiting motivation to address performance gaps.2,26
Returning to the case…

Adam’s glowing self-assessment in the face of multiple sources of negative feedback is an example of
the Dunning-Kruger effect. His lack of competence limits the accuracy of his efforts to assess his
own abilities as a resident. In fact, he overestimates his abilities and feels he is above average relative
to his peers.
The discordance between Adam’s assessment of his abilities and those around him created a layer of
cognitive dissonance. He displayed many of the psychological defense mechanisms that inherently
protect us from negative information and he proceeded to deny, dismiss, and rationalize many of the
objective external sources of data he was presented with.
To add more validity to Adam’s self-assessment, his PD decided that Adam’s new self-assessment
could be organized around the ACGME Milestones as an accepted external standard to serve as a
benchmark. To improve the accuracy of his self-assessment, he provided additional external data as
well, including feedback from nurses and techs in the department and faculty evaluations.
Adam’s PD also decided that a trusted faculty advisor who can help Adam reconcile the gap between
his self-assessment and the external sources of data would be essential. The advisor presented the ex-
ternal data to Adam in a non-threatening manner while trying to maintain Adam’s self-efficacy.
Guided reflection led by the advisor served as an important bridge between Adam’s self-assessment
and the external sources of data. With help, Adam identified learning goals and an action plan for
remediation. Adam followed this plan across his PGY-2 year and did indeed rise to become one of the
top residents in his class by the end of residency.
References:
1. Eva KW, Regehr G. “I’ll never play professional football” and other fallacies of self-assessment.
J Contin Educ Health Prof. 2008;28(1):14-19.
2. Eva KW, Regehr G. Self-Assessment in the Health Professions: A Reformulation and Research
Agenda. Acad Med. 2005;80(10):S46.
3. Manning G. Self-directed learning: A key component of adult learning theory. Business and
Public Administration Studies. 2007;2(2):104.
4. Bourke R, Mentis M. Self-assessment as a lens for learning. In: ResearchGate. ; 2007:322.
5. Regehr G, Eva K. Self-assessment, Self-direction, and the Self-regulating Professional: Clin

Orthop Relat Res. 05/2006;PAP. doi:10.1097/01.blo.0000224027.85732.b2
77
6. Hildebrand C, Trowbridge E, Roach MA, Sullivan AG, Broman AT, Vogelman B. Resident self-
assessment and self-reflection: University of Wisconsin-Madison’s Five-Year Study. J Gen
Intern Med. 2009;24(3):361-365.
7. Duffy FD, Lynn LA, Didura H, et al. Self-assessment of practice performance: development of
the ABIM Practice Improvement Module (PIM). J Contin Educ Health Prof. 2008;28(1):38-46.
8. Silver I, Campbell C, Marlow B, Sargeant J. Self-assessment and continuing professional

development: the Canadian perspective. J Contin Educ Health Prof. 2008;28(1):25-31.
9. Eva KW, Regehr G. Effective feedback for maintenance of competence: from data delivery to
trusting dialogues. CMAJ. 2013;185(6):463-464.
10.Gordon MJ. A review of the validity and accuracy of self-assessments in health professions
training. Acad Med. 1991;66(12):762-769.
11.Davis DA, Mazmanian PE, Fordis M, Harrison RV, Thorpe KE, Perrier L. Accuracy of
Physician Self-assessment Compared With Observed Measures of Competence: A Systematic
Review. JAMA. 2006;296(9):1094-1102.
12.Kruger J, Dunning D. Unskilled and unaware of it: how difficulties in recognizing one’s own
incompetence lead to inflated self-assessments. J Pers Soc Psychol. 1999;77(6):1121-1134.
13.Sadosty AT, Bellolio MF, Laack TA, Luke A, Weaver A, Goyal DG. Simulation-based
emergency medicine resident self-assessment. J Emerg Med. 2011;41(6):679-685.
14.Boud D. Enhancing Learning Through Self-Assessment. Routledge; 1995.
15.Boud D. Avoiding the traps: seeking good practice in the use of self assessment and reflection
in professional courses. Soc Work Educ. 1999;18(2):121-132.
16.Sargeant J, Armson H, Chesluk B, et al. The processes and dimensions of informed self-
assessment: a conceptual model. Acad Med. 2010;85(7):1212-1220.
17.Donnon T, Al Ansari A, Al Alawi S, Violato C. The reliability, validity, and feasibility of

multisource feedback physician assessment: a systematic review. Acad Med.
2014;89(3):511-516.
18.Galbraith RM, Hawkins RE, Holmboe ES. Making self-assessment more effective. J Contin
Educ Health Prof. 2008;28(1):20-24.
19.Sargeant J, Mann K, van der Vleuten C, Metsemakers J. “Directed” self-assessment: Practice

and feedback within a social context: J Contin Educ Health Prof. 24/2008;28(1):47-54.
20.NEJM Knowledge+ Team. Practice-based Learning and Improvement: ACGME Core

Competencies. NEJM Knowledge+. https://knowledgeplus.nejm.org/blog/practice-based-
learning-and-improvement/. Published July 28, 2016. Accessed October 16, 2018.
21.Li S-TT, Paterniti DA, Tancredi DJ, et al. Resident Self-Assessment and Learning Goal
Development: Evaluation of Resident-Reported Competence and Future Goals. Acad Pediatr.
2015;15(4):367-373.
22.Meier AH, Gruessner A, Cooney RN. Using the ACGME Milestones for Resident Self-
Evaluation and Faculty Engagement. J Surg Educ. 2016;73(6):e150-e157.
23.Conlon M. Appraisal: the catalyst of personal development. BMJ. 2003;327(7411):389-391.
24.Eva KW, Regehr G. “I’ll never play professional football” and other fallacies of self-
assessment. J Contin Educ Health Prof. 2008. https://onlinelibrary.wiley.com/doi/abs/
10.1002/chp.150.
25.Colthart I, Bagnall G, Evans A, et al. The effectiveness of self-assessment on the identification

of learner needs, learner activity, and impact on clinical practice: BEME Guide no. 10. Med
Teach. 2008;30(2):124-145.
26.Gilbert DT, Wilson TD. Miswanting: Some Problems in the Forecasting of Future Affective
States. In: Thinking and Feeling: The Role of Affect in Social Cognition. Cambridge University
Press; 2001:178-197.
79
CH AP T E R 9
Bolman & Deal Four-Frame Model

Authors: Lexie Mannix, MD; Shawn Mondoux, MD; David Story, MD
Editor: Michael Gottlieb, MD
A Case
Rhonda is a new program director (PD) in a three-year Emergency Medicine training program in
the United States. She has been PD for the past year and is leading the curricular conversion of the
weekly conference day from traditional didactic lectures to a more interactive flipped classroom
model. Although this change is supported by the literature, Rhonda is already hearing rumblings of
discontentment from two separate stakeholder groups: the residents and the academic teaching staff.
Rhonda progressed to the PD role relatively early in her career. She has been consistently recognized
for her attention to detail, tireless work on establishing strong program frameworks, and efforts to
actively evaluate the successes and failures of her interventions. Despite this, she is feeling under-
equipped to deal with a change management problem. She has astutely recognized that her ability to
prepare the structural elements for this change may not be enough to convince either or both of these
stakeholder groups.
Rhonda wishes she could find a framework that might help her understand which approach to change
might be most useful for each of the stakeholder groups. She has recognized through conversation
with these groups that they have different concerns and needs when applying the flipped classroom
model. As such, she has recognized that addressing these concerns with explicit and deliberate
change strategies might be the best approach.
80
OVERVIEW | BOLMAN & DEAL FOUR-FRAME MODEL
In 1984, Lee Bolman and Terrence Deal published their theory to describe methods for in-
fluencing and invoking change within an organization.1 They present four frames, or
lenses, through which the mechanism of change can be viewed: structural, human re-
source, political, and symbolic. Factors in determining the appropriate frames to use in a
given situation include the type of organization, the people undergoing the change, and
the degree of change being implemented. A leader may show proclivity for one or two
frames but may be resistant to using others due to lack of familiarity or comfort, resulting
in recurrent use of less effective techniques. Success or failure of an attempted change
movement can often be explained retrospectively via this model through the interpreta-
tion of the lenses used.

• Lee Bolman
• Terrence E. Deal
Background
Most organizations have a hierarchical system that places individuals in positions of lead-
ership. Leaders and managers vary wildly in their skills, attributes, leadership styles, and
interpersonal strengths. Recognizing different perspectives is a necessary skill in order to
be an effective manager of individuals with various personality types. However, many
people in supervisory positions are not well versed in reframing their perspective or are
not aware how to do so.
In 1984, Bolman and Deal wrote Reframing Organizations: Artistry, Choice and Leadership,
with the goal of defining four different perspectives, or frames, that can be used to assess
organizations, leaders, and events.1 Each frame is unique in its viewpoint and enlightens
specific aspects of the case in question. A given person or situation may be better viewed
through either one or multiple lenses in order to obtain a complete picture. Many people
tend to use one or two frames preferentially, so being able to intentionally reframe in alter-
native perspectives is a valuable skill for effective problem solving.
81
Bolman and Deal’s Four Frames Model:
1. Structural
This frame focuses on the “how” of change. It primarily focuses on strategy; clarifying
tasks and responsibilities; setting measurable goals and deadlines; and creating systems
and protocols. This frame is well-suited for organizations and managers that deal in analy-
sis and logic. Roles and goals are generally well-defined. The emphasis is on rationality,
facts, and data. The organization itself can be thought of as a machine, requiring precision
movements of many cogs. As such, the leader will need to be direct, focused, and methodi-
cal.
2. Human Resource
This frame is focused on the individual, their needs, and their value within the organiza-
tion. The emphasis is on giving team members the power and opportunity to perform their
jobs well. Interpersonal skills are critically important as coaching, motivation, guidance,
and support of the individual are key in establishing the role and fit. The organizational
goal is often empowerment and job satisfaction within the workforce.
3. Political
This frame emphasizes the importance of addressing conflicts between individuals or dif-
fering interest groups. The characteristics of an organization viewed through this lens in-
clude scarcity, power, allies, and deal-brokering. Leaders need advocacy, networking, and
negotiation skills. The emphasis is on using any and all assets to maximize the benefits to
the unit, organization, or workforce with the recognition that not all needs may be met.
4. Symbolic
This frame is often described as theatrical because the focus is on aligning individual goals
with organizational goals to create a sense of purpose or meaning in one’s work. The man-
ager must be a charismatic visionary with the ability to excite and motivate through story-
telling and showmanship. The leader should ensure that there is a motivating vision and
actively recognize excellent performance in their team members.
The four frames are guides that can be used to evaluate a complex situation and determine
solutions. Reframing it with these lenses provides insights regarding the root causes behind
an issue, possible paths for forward progress, and methods for achieving the desired goal.
Analysis from a single viewpoint rarely offers the complete picture. Organizations and
people are complex entities, requiring thoughtful analysis when evaluating, motivating, or
initiating change. Using this organizational theory from Bolman and Deal will allow one to
82
gain clarity of the task at hand and develop a roadmap for the most efficient and appropri-
ate strategies for achieving success.

Bolman and Deal have released six editions of their book. While their examples have
changed to become more modern, there have not been many major advancements to the
tenets or components of the theory.
Gwen Moran wrote an article in Fast Company that addressed utilizing reframing in the
lives of individuals.2 Pulling back the organizational component of Bolman and Deal, she
focused on individuals and events. She discussed the benefits of reframing, include putting
a positive spin on a generally negative situation and as a means of reassessing a task that
may be idle or heading in the wrong direction.
Additionally, Bolman and Deal created a Leadership Orientation Survey, which measures
individuals' propensity toward leadership through each of the four frames. Completing the
questionnaire results in a detailed analysis of which frames the survey-taker shows
preference for, allowing for the identification of frames that they may be reticent in
utilizing. The survey (available here: http://www.leebolman.com/orientations.htm) has
been used by management to research leadership styles and frames of employees.
Successfully enacting change requires buy-in from all of the involved parties. Being able to
reframe an idea allows the change-maker to address concerns from different groups in a
manner that will resonate at an individual level.
Patient flow modifications may create some anxiety among members of the healthcare
team. For example, developing a Clinical Decision Unit (CDU) will impact physicians,
nurses, and hospital administrators. A CDU is more akin to inpatient medicine and will
require physicians to disposition patients after undergoing tests seldom ordered in an
Emergency Department setting. Emergency nurses generally function with a high-acuity,
rapid pace in the ED. However, in a CDU setting, the excitement is more limited, which
may make the job less appealing. Hospital administration also needs convincing that CDU
creation will increase the institution’s bottom line, while avoiding a negative impact on
other hospital processes. Three groups with vastly different impetuses need addressing.
The physician group may be best framed in the Structural model, while nursing may be
best viewed using the Human Resources lens, and hospital administration evaluated with
Political frame.
83
In the education setting, adopting an alternative method of evaluating learners is likely to
cause some challenges, as well. End-of-shift evaluation forms, while common, are often
seen as relatively low yield by faculty. Requiring faculty to complete the forms without
understanding the potential benefit can lead to poor responses. Similarly, if learners do not
see a value, they may be less likely to incorporate the feedback. Therefore, it is important to
address this with both faculty and learners to ensure that there is proper support. This may
involve one or more different frames.
One limitation of this theory lies in the ability to choose a frame for a given situation. There
is no blueprint for selecting which lens will result in the correct focus, and in some cases,
multiple techniques may need to be considered and utilized in order to successfully
integrate change. Picking the wrong frame to address a change may result in a significant
error that could prematurely damage or destroy a potential initiative.
A second limitation of this theory is that it provides little guidance in environments with
more distributed leadership. When multiple agents are simultaneously affecting change, it
become less clear how to apply strategic changes through a variety of avenues.
Finally, this theory simplifies management styles into single approaches when in fact the
application of many of these may be required to affect change. These are further described
in texts such as “Leading Change” by John Kotter and “7 Habits of Highly Effective
People” By Stephen Covey.
84
Bleich MR. Job and Role Transitions: The Pathway to Career Evolution. Nurs Adm Q.
2017;41(3): 252-257.3
Bleich provides a summary of the four frames by Bolman and Deal and then uses this to guide
readers regarding their readiness and need for change.
Lieff SJ, Albert M. The Mindset of Medical Education Leaders: How do they conceive
of their work? Acad Med. 2010 Jan;85(1):57-62.5
Lieff and Albert explore the use of each of Bolman and Deal’s four frames by medical education
leaders. The authors determined that most leaders in medical education use all four frames, with
the majority favoring the human resource frame. The article contains an excellent visual depiction
of each frame and the themes in medical education in which to apply each frame.
Sasnett B, Clay M. Leadership styles in interdisciplinary health science education. J In-
terprof care. 2008; 22(6):630-638.6
Sasnett and Clay evaluate interdisciplinary health science education, including multiple areas
within medicine. The authors discuss how different areas of medicine (eg, nursing, occupational
therapy, medical residency directors, radiation therapy, interdisciplinary, and health information
management) use varying frames. According to the article, the most prevalent frame across all ar-
eas of medicine is the human resource frame.
Swan-Sein A, Mellman L, Balmer DF, RIchards BF. Sustaining an advisory dean pro-
gram through continuous improvement and education. Acad Med. 2012; 87(4): 523-528.7
Swan-Sein and colleagues discuss applying Bolman and Deal’s four frames with regard to the ad-
visory dean program at the Columbia University College of Physicians and Surgeons. This article
is a real-life example of applying the four frames to an academic medicine organization, including
specific strategies associated with each frame.
Chan TM, Luckett-Gatopoulos S, Thoma B. Commentary on competency-based med-
ical education and scholarship: creating an active academic culture during residency.
Perspectives on medical education. 2015;4(5):214-7.
Chan and colleagues take us through a medical education example of how Bolman and Deal’s four
frames can help us to dissect out a problematic situation in graduate medical education around re-
search support for trainees.
After reading Bolman and Deal, Rhonda feels empowered to address the concerns of both groups.
Her change management strategy is now catered to the individual needs of the stakeholder groups.
She feels the following strategies will allow her to be more successful in the implementation of the
change concept in her group.
85
With staff members, Rhonda heard a concern that prepared educational materials would have to be
changed and that these individuals felt unprepared for this curricular change. Therefore, she decided
to use a “structural” model with this group. Many of these individuals did not understand the
meaning of “flipped classroom”. As such, Rhonda stayed with her usual structured approach to
change management. She established small group sessions with staff members who required more
background and education surrounding the flipped classroom change. She disseminated a slide deck
which underlined core principles of flipped classroom and used slide labelling to reinforce the flipped
classroom model. She also devised a formal document which outlined the core reading list to be
disseminated to residents for each session. Finally, she set up an automated email schedule of staff
reminders for each step of the presentation and session preparation. Staff felt well-supported in their
change to a flipped classroom model.
With the residents, Rhonda applied a different approach to change management. Residents were
concerned regarding the additional preparatory work that needed to be done before the educational
sessions and balancing clinical requirements with this new work. With this group, Rhonda tried the
“human resources” approach. She recognized that residents needed to have a clear and caring
approach to this new change and required an open communication channel with the PD to discuss
stressors, problems, and ongoing solutions. Specific residents were appointed to bring these concerns
forward. Rhonda established a WhatsApp channel for all of the residents to engage in asynchronous
discussion of any issues. Residents took part in the curricular design to ease the workload over the
year and a quick discussion of any issues preceded each academic day.
After following the model discussed above, Rhonda feels that the change implementation was much
smoother than she could have otherwise hoped. Although the change remained complex with ongoing
operational and individual challenges, the overall impression of the conversion to the flipped
classroom has been positive. Both teaching staff and learners are much happier with the new
education model.
86
References
1. Bolman LG, Deal TE. Reframing Organizations: Artistry, Choice, and Leadership. John Wiley &
Sons; 2017.
2. Moran, Gwen (2014, August 4). 5 Ways to Change the Way You Think About Negative Life
Events. Fast Company. Retrieved from https://www.fastcompany.com/3033887/5-ways-to-
change-the-way-your-think-about-negative-life-events
3. Bleich MR. Job and Role Transitions: The Pathway to Career Evolution. Nurs Adm Q.
2017;41(3):252-257.
4. Gallos JV. Reframing complexity: A four dimensional approach to organizational diagnosis,

development, and change. Organization development San Francisco: Jossey-Bass. 2006.
https://www.tntech.edu/assets/uploads/Reframing_Complexity.pdf.
5. Lieff SJ, Albert M. The Mindsets of Medical Education Leaders: How Do They Conceive of
Their Work? Acad Med. 2010 Jan;85(1):57-62.
6. Sasnett B, Clay M. Leadership styles in interdisciplinary health science education. J Interprof

Care. 2008;22(6):630-638.
7. Swan-Sein A, Mellman L, Balmer DF, Richards BF. Sustaining an advisory dean program
through continuous improvement and evaluation. Acad Med. 2012;87(4):523-528.
CH AP T E R 1 0
Kotter’s Stages of Change

Authors: Dallas Holladay, DO; Melissa Parsons, MD; Gannon Sungar, DO
Editor: Daniel Robinson, MD
A Case
Christy was ecstatic when the residency program director offered her the opportunity to overhaul the
residency conference curriculum. As a junior faculty member with an interest in education and an
alumnus of the residency program, she knew all too well about the shortcomings of the current
curriculum and saw this as a great opportunity to have a positive impact on the residency. With the
goal of eventually joining the residency leadership, Christy also saw this as a great professional
opportunity for her to develop and demonstrate some success in an area about which she was
passionate.
The residency conference curriculum had long been a source of struggle within the program. Faculty
engagement and participation was low and the residents would often complain about the same
boring lectures or the lack of relevant content. Residents attended conference only to satisfy
requirements and were frequently caught sleeping or trying to complete their charts from their latest
shift. Year after year, conference was battered on the yearly residency survey, and for good reason, it
needed fixing.
Christy was confident that she had the necessary skills to create a more interesting and engaging
curriculum and give both residents and faculty what they wanted. But she faced some challenges as
well. This would mark the third significant curriculum overhaul in the past 6 years. Faculty were
tired of efforts to reinvigorate the curriculum. Many felt that since the past efforts had failed, there
was no sense in trying again.
And what if she failed? While this was a great opportunity to make a lasting mark on the residency
program and to demonstrate her skills as an educator and leader, Christy needed to succeed. If things
didn’t improve, it would reflect very poorly on her.
Questions for the reader:
· Why did past attempts to improve the curriculum fail?

· What can Christy do to ensure the successful implementation of the new curriculum?
88
OVERVIEW | KOTTER’S CHANGE FRAMEWORK
John Kotter, a professor at Harvard Business School and world-renowned change expert,
studied over 100 companies, in the US and globally, varying in size and industry type.
His research of change implementation in these companies enabled him to describe why
and how a majority of change efforts fail. He developed an 8 step model to avoid major
errors in the change process. There are two key lessons that Kotter suggests through his
change model. First, that “the change process goes through a series of phases that, in
total, usually require a considerable length of time,”1 but “skipping steps creates only the
illusion of speed and never produces a satisfying result.”1 So to apply Kotter’s 8 step
change model requires progressing in a sequential order and giving each step adequate
time to be completed appropriately prior to progressing to the next step. The second
lesson is that “critical mistakes in any of the phases can have a devastating impact,
slowing momentum and negating hard-won gains.”1 No step can be omitted or looked
over in order to be successful in implementing change.

•John Kotter
Kotter JP. Leading Change: Why Transformation Efforts Fail. Harvard Business Re-
view. 1995; (March-April): pp. 59–67.
Kotter JP. Leading Change. Boston. Harvard Business School Press, 1996
Background
Kotter initially described his 8 steps in terms of errors that the companies he observed
made:1
Error 1: Not establishing a great enough sense of urgency.

Error 2: Not creating a powerful enough guiding coalition.
Error 3: Lacking a vision.
Error 4: Under communicating the vision by a factor of ten.
Error 5: Not removing obstacles to the new vision.
Error 6: Not systematically planning for and creating short-term wins.
Error 7: Declaring victory too soon.
Error 8: Not anchoring changes into the corporation’s culture.
89
Instead of just describing the errors, Kotter’s eight-step model for transforming organiza-
tions has also been formulated into actions.
1. Establish a sense of urgency.
● The company or group of individuals involved need to develop a sense of ur-

gency around the need for change. Approximately 75% of an organization’s
management needs to be convinced that change is required to achieve success.1
● Consider performing a SWOT analysis in order to identify strengths, weaknesses,

opportunities and potential threats. Examine the market and competitive reali-
ties.2
2. Form a powerful guiding coalition.
● To lead change, a coalition or team of influential people from a variety of sources

need to come together to continue building urgency and momentum around the
need for change. This team does not need to follow the traditional company hier-
archy.1 It is ideal to have a good mix of people from different departments and
different levels within the company, not just the top managers.
● To make this team a powerful coalition, consider team-building events such as an

off-site retreat or other teamwork exercises.1,3

In 2014, 18 years after the initial publication, Kotter’s 8 Steps of Change underwent some
revisions. In the updated version, Kotter recommends that instead of performing the steps
sequentially to achieve episodic change, the steps should be run continuously and
concurrently. This more accurately represents the fast-paced evolution of change in the
modern, digital age and is more consistent with sustaining success. Improving on the
original step 4, revised steps of change promotes the idea of “building an army of
volunteers” to disseminate your message across the organization. This strengthens the
communication and enhances the urgency created in step 1. Initially, Kotter discussed
creating a traditional hierarchy, however, revisions recognized that change is best
implemented when there is a flexible network as well as a hierarchy. This helps build
volunteers of change horizontally and vertically across an organization. This is critical to
previous steps of creating urgency and sustaining communication about the idea.
Additional modifications include constantly seeking new opportunities and acting on those
opportunities quickly. This allows for improved implementation of step 6 and generating
short term wins.3,4
90
While Kotter’s 8 Stages of Change are typically discussed in the context of large,
institutional change, the principles can be loosely applied in micro situations as well. For
example, if as an educator you have decided to implement a new education style such as
bedside teaching, Kotter’s method can be useful. In this example, at the beginning of the
shift the educator can create a sense of urgency by discussing the literature supporting
improved learning and resident satisfaction with bedside teaching. Creating a guiding
coalition can involve the educator, the learner and the patient. Involving each patient in the
coalition can enrich the experience for the learner. Involve the learner by creating a clear
vision of the goals for each bedside teaching encounter. During a short encounter, like a
shift, frequent communication of the goals of bedside teaching is less important. This can
be applied to debriefing to ensure the goals are being met. Remove obstacles by eliciting
frequent feedback from the learner and incorporating any suggestions as necessary. Create
short term wins by praising learners willingness to try different education models. Steps 7
and 8 are somewhat less applicable as they are geared toward larger institutional change,
but they could apply loosely in this scenario by maintaining consistency using bedside
teaching throughout the shift rather than switching between teaching methods.
While Kotter’s model of change management was an instant success when published in the
late 1990s and is widely viewed as the definitive model for successfully implementing insti-
tutional change, it has been criticized on a number of fronts. From a methodological stand-
point, both the original article and the subsequent book were based on Kotter’s personal
experience and did not reference any outside sources. In fact, neither the original article nor
the book have footnotes, references or a bibliography.5 Since the original publication, much
of the evidence to support Kotter’s model has been published by Kotter himself, raising
concern in the validity of the model.5
In regards to the model itself, one common criticism is that it is too rigid.5 Change is a dy-
namic and complex process, and Kotter’s original model stresses the successful completion
of each step before proceeding to the next with the assumption that change is a one-time
event that ends in stability. While this approach looks good on paper, many have argued
that, in reality, change is continuous. Steps likely occur in parallel, and often steps will need
to be revisited during a single change process. Additionally, the original model argued for
formation of the guiding coalition from high-level management, creating a top down
process where employees can be seen as the object of change rather than participants.6
Both of these critiques were addressed by Kotter in follow-up publications in 20124 and
2014, in which he argues for more flexibility and agility within his own framework and for
91
the formation of a “volunteer army” from all levels of the organization to replace the guid-
ing coalition.
Christy’s first task was to address the complacency, lack of interest, and engagement from both
faculty and residents regarding the conference curriculum. She felt that her program was one of the
top programs in the country; it was simply unacceptable that they had a conference curriculum that
was so weak and uninspiring. Christy did a SWOT analysis to better understand the Strengths,
Weaknesses, Opportunities and Threats within the current curriculum. She met with residents and
faculty to create a sense of urgency, that improving the curriculum was a must.
Next, Christy created a committee including departmental leadership, residency program leadership,
faculty and residents to create a vision for the what the new curriculum would entail. This group,
covering all levels of the program, was relentless in communicating not only the need for change, but
also their clear, concise vision for a more interactive, interesting, and high-yield curriculum. Christy
met with residency leadership to see what structural changes could be made to how conference was
organized, in order to make room for her innovative ideas. She empowered faculty and residents to
brainstorm new and engaging ideas for the curriculum. After piloting a few of these ideas and seeing
the excitement of both residents and faculty, Christy and her committee continued to refine the
curriculum and made sure to communicate and celebrate their success to leadership, faculty and
residents. As conference got more interesting and fun, faculty participation began to grow. Christy
worked with departmental and residency leadership to solidify within the culture of the program the
expectation of faculty participation.
Two years later, the conference curriculum was still running strong and continuing to innovate.
Now residents, and faculty alike, are proud of the curriculum, boasting about it during recruitment
season. The program has become much stronger due to the curricular changes, and Christy solidified
herself as an educational leader and change agent within the department.
92
Kotter JP. Leading Change: Why Transformation Efforts Fail. Harvard Business Review.
1995; (March-April): pp. 59–67.
This article is the initial article published by John Kotter elaborating his 8 Step pathway for
leading change in companies successfully. This article was later expanded into a book called,
Leading Change. In the original article, the steps to the pathway are defined in terms of the er-
rors that Kotter saw companies make, instead of as a checklist or pathway of what to do. He
does elaborate on these errors and how to avoid them in some detail, however this article is rel-
atively short. His book would cover the steps in much more detail.
Applebaum SH, Habashy S, Malo JL, et al. Back to the Future: Revisiting Kotter’s 1996
Change Model. Journal of Management Development. 2012, 31(8):764-782.
While widely accepted as the model for change management, Kotter’s eight steps were devel-
oped without any scientific evidence for their support. This article reviews the literature and
provides the available evidence to support or refute each of Kotter’s eight steps. In addition, it
mentions many of the major criticisms of Kotter’s original model.
Kotter JP. Accelerate! Harvard Business Review. 2012. 90(11):45-58

This 2012 article in the Harvard Business Review written by Kotter himself, provides an update
on his original change model to address the increased pace of change challenging many con-
temporary organizations. While Kotter supports the initial model, he does offer some conces-
sions in regard to the rigidity of the sequential approach and recommends a less hierarchical
approach to that proposed in the original model. Additionally, Kotter argues that institutions
should institute a dual operating system with a management hierarchy to conduct day-to-day
operations and a dynamic strategy network to identify opportunities and implement change.
References
1. Kotter JP. Leading change: Why transformation efforts fail. 1995;(March-April):59-67.
2. Mento A, Jones R, Dirndorfer W. A change management process: Grounded in both theo-

ry and practice. J Change Manag. 2002;3(1):45–59.
3. Kotter’s 8-Step Change Model - Change Management Tools from Mind Tools. https://
www.mindtools.com/pages/article/newPPM_82.htm. Accessed July 9, 2018.
4. Kotter JP. Accelerate! Harv Bus Rev. 2012;(November 2012). https://hbr.org/2012/11/

accelerate. Accessed July 9, 2018.
5. Appelbaum SH, Habashy S, Malo J-L, Shafiq H. Back to the future: revisiting Kotter’s
1996 change model. J Manag Dev. 2012;31(8):764–782.
6. O’Keefe K. Where Kotter’s 8 Steps Gets it Wrong. Corp Exec Board CEB. 2011. https://
www.cebglobal.com/blogs/where-kotters-8-steps-gets-it-wrong/.
93
94
95
EDUCATION THEORY
MADE PRACTICAL
VOLUME 3
Academic Life in Emergency Medicine

Faculty Incubator
96
View publication stats

EdTheoryMadePractical Volume3

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

EdTheoryMadePractical Volume3

Uploaded by

Copyright:

Available Formats

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

Education Theory Made Practical: Volume 3

Book · October 2020

Daniel W. Robinson Teresa Man-Yee Chan

SEE PROFILE SEE PROFILE

Sara Krzyzaniak Michael Gottlieb

SEE PROFILE SEE PROFILE

Data Sciences View project

FOAMSearch View project

The user has requested enhancement of the downloaded file.

Robinson | Chan | Krzyzaniak | Gottlieb | Schnapp | Spector | Papanagnou

Education Theory Made Practical: Volume 3

First edition, October 2020.

Available for usage under the Creative Commons Attribution-

Particularly, we would like to thank Academic Life in Emergency Med-

Education Theory Made Practical (Volume 3) continues our case-based

Each chapter begins with a common case facing educators, followed by

Where can I find this online?

Copy and Layout Editor

Chapter 2 The Kirkpatrick Model: Four Levels for Evaluating Learning

Chapter 3 Realist Evaluation

Chapter 4 Mastery Learning

Chapter 5 Cognitive Theory of Multimedia Learning

Chapter 7 Programmatic Assessment

Chapter 8 Self-Assessment Seeking

Chapter 9 Bolman & Deal Four-Frame Model

Chapter 10 Kotter’s Stages of Change

When I hear disparaging comments about Health professions education theory, I am

Professor Arizona State University

Past Editor-in-Chief of Academic Medicine

Six Steps Model of Curriculum Development

• How can she determine what content is appropriate?

• What format should she use?

• How will she know if her curriculum was successful?

MAIN ORIGINATORS OF THE THEORY

Modern takes on this Theory

Competency-based medical education

GME is shifting from a fixed training period to a competency-based model in which a

Other Examples of Where this Theory Might Apply

Example 1: While examples of curricular development in GME will tend to focus on

Lucas R, Choudhri T, Roche C, Ranniger C, Greenberg L. Developing a Curriculum for

Limitations of this Theory

Returning to the case...

2. Shappell E, Chan T M, Thoma B, et al. Crowdsourced Curriculum Development for

4. Lucas R, Choudhri T, Roche C, Ranniger C, Greenberg L. Developing a Curriculum for

5. Sweet, LR, Palazzi, DL. Application of Kern’s Six-step approach to curriculum

6. Duran-Gehring P, Bryant L, Reynolds JA, Aldridge P, Kalynych CJ, Guirgis FW.

The Kirkpatrick Model

MAIN ORIGINATORS OF THE THEORY

Other important authors in this area:

The Four Levels: A Case Study

Modern takes on this Theory

In addition to these adaptations, James and Wendy Kirkpatrick (2016), Donald

Other Examples of Where this Theory Might Apply

Limitations of this Theory

Returning to the case...

Praslova L. Adaptation of Kirkpatrick’s four level model of training criteria to assessment of

2. Yardley S and Dornan T. Kirkpatrick’s levels and education ‘evidence’. Medical

6. Hammick M, Freeth D, Koppel I, Reeves S, and Barr H. A best evidence systematic

9. Steinert Y, Mann K, Anderson B, Barnett BM, Centeno A, Naismith L, Prideaux D,

10.Steinert Y, Mann K, Centeno A, Dolmans D, Spencer J, Gelula M, and Prideaux D. A

14.Wong G, Greenhalgh T, and Pawson R. Internet-based medical education: a realist

15.Hamtini TM. Evaluating e-learning programs: an adaptation of Kirkpatrick’s model to

6. Vogel L. Educators propose "flipping" medical training. Can Med Assoc J.