SPECIAL ARTICLES

Effect of Clinical Teaching on Student Performance
during a Medicine Clerkship
Stuart A. Roop, MD, Louis Pangaro, MD
PURPOSE: To measure what proportion of student clerkship
performance can be attributed to teachers’ educational skills as
reported by students.
SUBJECTS AND METHODS: From August 1992 to June
1994, we collected critiques of teacher skills from 314 third-year
students at the end of a 12-week medicine clerkship. Interns,
residents, attending physicians, and student preceptors were
rated (on a 1 to 5 scale) on teaching behaviors from the 7 categories of the Stanford Faculty Development Program framework. A linear regression model was used to determine the relative contributions of the rated teaching behaviors in predicting
final student performance and improvement across the clerkship (“student growth”), measured using end-of-clerkship variables (clinical grades, National Board of Medical Examiners
medicine shelf examination, practical laboratory examination,
and an analytical essay examination) and preclerkship variables
(pre–third-year grade point average [GPA], United States Medical Licensing Examination, Step I, and clerkship pretest).
RESULTS: Data were available for 293 (93%) of 314 students,
who completed a total of 2,817 critiques. The students’ pre-

clerkship GPA accounted for the greatest percentage of variance
in student performance (28%, P ⬍0.0001). Clinical teaching
behaviors accounted for an additional 6% (P ⬍0.0001) of the
variance. For student growth across the clerkship, teaching accounted for 10% of the variance (P ⬍0.0001). Among the 7
Stanford educational categories, teaching behaviors promoting
control of session (r2 ⫽ 5%, P ⫽ 0.0002) and fostering understanding and retention (r2 ⫽ 4%, P ⫽ 0.001) had the greatest
effect. The resident had the most effect on student growth (r2 ⫽
6%, P ⫽ 0.0001) when compared with other teaching levels.
Teaching had a greater effect on growth for students with preclerkship GPA above the mean (16% versus 6%), for older students (24% versus 7%), and for students with a nonscience undergraduate degree (33% versus 9%).
CONCLUSION: The preclerkship GPA, reflecting 2 years of
work, was the most important predictor of student performance. Teaching behavior, as measured by student assessments,
also affected student performance. Am J Med. 2001;110:
205–209. 䉷2001 by Excerpta Medica, Inc.

“T

rected learning. Student ratings of their clinical teachers
based on these educational categories are both reliable
and valid (3,4). These student assessments of teaching
skills are commonly used for the evaluation, development, and promotion of clinician-educators.
Prior studies have used faculty self-assessments (2,5–7) or
student ratings (2,8,9) as the outcome measures for teaching
effectiveness, rather than measuring student performance,
as has been recommended (10). Few prior studies, however,
have addressed the effect of teaching behaviors on this outcome (11), perhaps because many other factors influence
students’ learning. Students enter clerkships with learning
habits and life experiences accumulated over many years
and interact with many teachers with different levels of
training. Thus the influence of even the most skilled clinical
teachers may be difficult to measure.
Using the Stanford Faculty Development Program educational framework for describing teaching behaviors,
and relying on an evaluation system of demonstrated reliability and validity as a measure of student learning and
performance (12), we used a predictive model to determine the effect of clinical teaching on student performance during the medicine clerkship and to identify specific teaching behaviors that are effective. Our hypothesis

eaching . . . is intended to lead toward learning” (1). Educational research has explored
the relation between the process of educating
(what the teacher does) and the desired product of education (what the student learns). This “process-product
paradigm” has also begun to influence the study and development of medical education.
One well-known initiative for the evaluation and development of clinical teachers is the Stanford Faculty Development Program (2), which has a framework for clinical teaching composed of seven educational categories:
setting a learning climate, leadership in control of session,
communicating goals, fostering understanding and retention, evaluation, feedback, and encouraging self-diFrom the Pulmonary/Critical Care Medicine Service (SAR), Walter
Reed Army Medical Center, Washington, DC, and the F. Edward Hebert School of Medicine (LP), Uniformed Services University of the
Health Sciences, Bethesda, Maryland.
The opinions contained in this article solely represent the views of the
authors and are not to be construed as representing the views of the
Department of Defense or the Department of the Army.
Requests for reprints should be addressed to Stuart A. Roop, MD, Pulmonary/Critical Care Medicine Service (SAR), Walter Reed Army Medical
Center, 6900 Georgia Avenue NW, Washington, DC 20307-5001.
Manuscript submitted August 20, 1999, and accepted in revised form
September 26, 2000.
䉷2001 by Excerpta Medica, Inc.
All rights reserved.

0002-9343/01/$–see front matter 205
PII S0002-9343(00)00672-0

Effect of Clinical Teaching on Student Performance/Roop and Pangaro

Table 1. Selected Statements from Student Critiques*
Category
Learning climate
My intern was interested in my patients.
My resident was interested in teaching.
My attending allowed me to present my patients on rounds.
Students not presenting the patient were kept involved in bedside sessions.
Understanding and retention
My intern helped me to learn the interpretation of basic laboratory tests.
My resident tried to assure variety in my patient load.
My attending reviewed important physical findings at the bedside.
The preceptor tried to have us cover common, serious medical problems in our sessions.
Control of session
My intern helped me budget time and read and complete my written work-ups and preceptor
assignments.
My resident assigned my patients to me.
My attending was prompt for rounds.
Preceptor sessions were held as planned.
Communication of goals
My preceptor expected my written case discussions within the week of being assigned the
patient.
The attending set educational goals for my patient contacts.
Feedback
My intern gave me clear, timely “feedback” on my progress.
My resident reviewed my written data base with me (History and Physical Exam, Laboratory
Data, Problem List).
Evaluation
My attending observed/evaluated my clinical skills and knowledge.
My preceptor observed me do a history and physical examination.
* The actual critiques consisted of 10 statements about intern teaching behaviors, 15 about resident teaching
behaviors, 9 about attending physician teaching behaviors, and 15 about preceptor teaching behaviors.

was that higher teacher ratings would be associated with
better student performance.

METHODS
From August 1992 to June 1994, 314 third-year students
at the Uniformed Services University of the Health Sciences School of Medicine (all students in 2 consecutive
years) were asked to fill out critiques evaluating the skills
of each clinical teacher. Two consecutive 6-week clerkship rotations were completed at 2 of 5 core teaching
hospitals. The critiques were completed before the students knew their final grade or test scores, although regular feedback was provided to the students throughout
the rotation. Students on the inpatient wards evaluated
interns, residents, and attending physicians; students on
ambulatory medicine rotations evaluated each clinic attending physician with whom they spent at least 5 halfdays. Both groups evaluated their small-group preceptors. Individual teaching behaviors were rated on a Likert
scale (1 ⫽ strongly disagree, 5 ⫽ strongly agree). The
critiques consisted of 10 to 15 statements that were designed to assess teaching behaviors expected of each level
of teacher within the 7 educational categories of the Stan206

February 15, 2001

THE AMERICAN JOURNAL OF MEDICINE威

ford Faculty Development Program framework (2).
Statements were classified into 1 of the 7 educational categories by consensus among the authors and 3 independent educators who are trained facilitators in the Stanford
Program (Table 1).
The ratings from each student for each teacher were
recorded. A cumulative score that included all questions
from all educational categories was calculated to determine the student’s overall rating of the clinical teaching to
which he or she was exposed. Four facilitators trained in
the Stanford program grouped teaching behaviors into
the 7 Stanford educational categories (there was agreement among 3 of the 4 facilitators for 50 of the 59 items),
and mean scores for each category (using the 50 items)
were determined. We determined the effect of the measured teaching behaviors on performance outcomes for
students categorized by prior grade point average (GPA),
age, sex, marital status, and undergraduate major.
The relative contributions of variables in modeling
student performance and student growth were examined
using multiple stepwise linear regression, with r2 representing the proportion of the variance in each outcome
variable that could be accounted for by the teaching
scores.

Volume 110

Effect of Clinical Teaching on Student Performance/Roop and Pangaro

Table 2. Effects of Selected Factors on Clerkship Final Grade and Student Growth*
Clerkship Total Score†
Variables

Multiple
r2

Preclerkship grade point average§
Ratings of teachers’ behaviors㛳
Age
Sex
Marital status
Undergraduate degree¶

0.28
0.34
0.35
0.36
0.36
0.36

Student Growth Across the Clerkship‡

Increase
in r2

P
Value

Multiple
r2

Increase
in r2

0.06
0.01
0.01

⬍0.0001
⬍0.0001
0.004
0.07
0.6
0.7

NA**
0.10
0.14
0.15
0.15
0.15

0.04
0.01

P
Value
0.003
0.001
0.04
0.9
0.9

* Using stepwise linear regression.

Based on clinical performance score (72%), National Board of Medical Examiners Medicine subject examination (18%), a 3-hour open book essay
examination of analytic ability (6%) and a multiple-choice test in the interpretation of laboratory values (4%).

Calculated as the difference between clerkship final grade and a preclerkship performance score (see methods).
§
Grade point average at the completion of the second year of medical school.

Cumulative measure (mean) of all clinical teaching scores from the student ratings of teaching behaviors.

Undergraduate degrees were divided into traditional biologic sciences (or premedical degree) and nonbiologic sciences degrees.
NA ⫽ not applicable: the preclerkship grade point average was one of the factors used in the growth outcome determination.

Student performance data were used to determine two
outcomes. The student clerkship total score was a composite of the clinical performance evaluation from teachers and three examination scores: the National Board of
Medical Examiners subject examination in internal medicine; a 30-minute multiple choice test in the interpretation of laboratory values; and a 3-hour open book, free
response examination of analytic ability, the Multi-Step
Examination (13,14), that assessed process skills in case
analysis. The clinical performance evaluation, which accounted for 72% of the clerkship total score, was derived
from rigorously developed formal evaluation sessions
that measured the students’ progress from reporter, to
interpreter, to manager/educator (12,15).
The second outcome was student growth. We determined a preclerkship performance score based on the
preclerkship GPA (weighted at 50%), the United States
Medical Licensing Examination, Step I (weighted at
40%), and the clerkship pretest, a 100-question in-house
examination based on the second-year clinical reasoning
course (weighted at 10%). To allow comparisons, the
clerkship total score and the preclerkship performance
score were converted to t scores (mean of 50, SD of 10).
Student growth was calculated by subtracting the preclerkship performance t score from the clerkship total t
score. Comparisons of continous variables were made using Student’s t test.

RESULTS
Complete preclerkship and postclerkship performance
data as well as student critiques were available for 293
(93%) of the 314 students who completed medicine
clerkships during the 2 academic years. A total of 2,817
critiques were completed (a mean of 9.6 per student). The

number of critiques completed by the students varied
slightly, as some students worked with more than one
ward team. For 74 students (24%), one of the two 6-week
rotations was an ambulatory medicine rotation (13), in
which the clinical teachers did not include house staff.
The mean age of students in our study was 27 ⫾ 3 years
(range 23 to 35); 79% (n ⫽ 248) were men, 52% (n ⫽
163) were married, and 74% (n ⫽ 232) had undergraduate degrees in the basic life sciences (traditional premedical majors). Approximately 20% (n ⫽ 18) of the housestaff and 10% (n ⫽ 11) of the teaching staff were women,
reflecting the greater proportion of male physicians in a
military teaching hospital.
The mean clerkship total score was 53 ⫾ 22 (range ⫺15
to 104), and the mean student growth measured across
the clerkship was 0.4 ⫾ 9.1 (range ⫺36 to 47). The mean
clerkship total score was higher (63 ⫾ 18 versus 42 ⫾ 21,
P ⫽ ⬍0.0001) for the 152 students with a preclerkship
GPA above the mean (⬎2.92). Students performed better
during the second academic semester (clerkship total
score 57 ⫾ 21 versus 49 ⫾ 23, P ⫽ 0.02). There was no
significant difference in clerkship total scores when students were grouped by age, sex, marital status, or undergraduate education.
In multivariate linear regression models, the students’
preclerkship GPA accounted for almost 30% of the variance of the students’ clerkship total score (Table 2). Overall rating of clinical teaching accounted for about 6% of
the variance in clerkship total score and 10% of the variances in student growth.
During the 12-week clerkship, each student was exposed to a mean of 10 ⫾ 2 supervising teachers who provided input to the clinical grade. The mean clinical evaluation score based on teachers’ evaluations of the students’ clinical performance was 40 ⫾ 18 (range ⫺19 to

February 15, 2001

THE AMERICAN JOURNAL OF MEDICINE威

Volume 110 207

Effect of Clinical Teaching on Student Performance/Roop and Pangaro

76), with an average intraclass correlation of 0.83. The
average rating by the students of clinical teaching behaviors was 4.3 ⫾ 0.4 (range 1 to 5). Mean ratings by students
with a clinical evaluation score above the mean were
somewhat higher than among students with a clinical
evaluation score below the mean (4.4 ⫾ 0.4 versus 4.2 ⫾
0.5, P ⬍0.001). For students with a clinical evaluation
score below the mean, reported teaching behaviors accounted for 13% (P ⫽ 0.03) of the variance in student
growth, compared with 5% (P ⫽ 0.4) for students with a
clinical evaluation score above the mean.
For the student growth outcome, variance attributable
to reported teaching behaviors was greater for students
with a preclerkship GPA above the mean (r2 ⫽ 16%)
compared with students with preclerkship GPA below
the mean (r2 ⫽ 6%); for older students (age above the
mean, r2 ⫽ 24%) compared with younger students (r2 ⫽
7%), and for students with nonscience undergraduate
majors (r2 ⫽ 33%) compared with those who were basic
science undergraduate majors (r2 ⫽ 9%). Teaching behaviors that significantly correlated with performance for
students in the upper half of the class included the following: “my attending directed my reading to important issues in my patients,” “my resident assigned my patients to
me,” and “my intern helped me learn the interpretation
of basic lab tests.” Behaviors that correlated with performance for students in the lower half of the class included
“my attending was available for discussions on my patients,” “preceptor sessions were held as planned,” and “my
resident made clear decisions about time off and Sundays.”
Reported resident behaviors had the most effect (r2 ⫽
6%, P ⬍0.0001), followed by the student’s preceptor
(r2 ⫽ 3%, P ⬍0.05) and the attending physician (r2 ⫽ 3%,
P ⬍0.05), on student growth. Removing the subset of
students (24%) whose clerkship included 6 weeks of ambulatory medicine from the overall analysis had no substantial effect on any of the results.
Teaching behaviors reflecting leadership style in control of session (r2 ⫽ 5%, P ⫽ 0.0002) and those fostering
understanding and retention (r2 ⫽ 4%, P ⫽ 0.001) explained the most variance in student growth, followed by
feedback (r2 ⫽ 2%, P ⫽ 0.01) and learning climate (r2 ⫽
2%, P ⫽ 0.01).

DISCUSSION
During the relatively short period of a 12-week medicine
clerkship, reported teaching skills affected student learning, although the measured effect was modest. Skeff et al
(2) examined the effect of clinical teaching seminars on
teaching behaviors using faculty self-assessments and
student evaluations but did not look at student performance as an outcome. Lucas et al (11) demonstrated a
relation between student performance and teaching behaviors affecting learning climate (“involving students”
208

February 15, 2001

THE AMERICAN JOURNAL OF MEDICINE威

and “showing respect”), but the study did not include
overall teaching performance and other teaching categories. By measuring student growth and performance and
by relating these to cumulative teaching rating and to
specific teaching behaviors, we were able to assess the
desired product of clinical education, namely, student
learning.
In our study, the most important predictor of student
performance was preclerkship GPA, which accounted for
28% of the variance in total clerkship score. This is not
unexpected, as we would anticipate that students who
performed well during their first 2 years of medical school
would continue to perform well. However, reported
teaching behaviors were also correlated with student performance, more so than students’ age, sex, marital status,
or undergraduate degree.
The measurable effect of teaching was greater on student growth, namely, the change in student performance
across the clerkship. Growth was adjusted for preclerkship student performance, and thus more directly assesses student learning during the clerkship. The relative
weights assigned to the preclerkship performance variables, however, were judgments. The preclerkship GPA
reflects the student’s cumulative work and learning habits
from the first 2 years of medical school and was therefore
given the most weight. The United States Medical Licensing Examination, Step I, a standardized examination with
demonstrated reliability and predictive validity, was
given more weight than the in-house clerkship pretest.
Among the levels of teachers evaluated, the resident
had the most effect on student performance. As team
leader, the resident may have the most direct influence on
the learning environment. The influence of residents on
the students’ learning seen in this study supports the initiatives of many institutions to include house staff in faculty development programs.
The effect of teaching on student growth during the
clerkship was greater for students with a higher preclerkship GPA. This is especially notable because the preclerkship GPA is weighted heavily in the preclerkship performance score; positive growth requires improvement
from that preclerkship level. Reported teaching behaviors
also showed greater effect on performance for older students and for those without a traditional premedical undergraduate degree. These results suggest that some
teaching behaviors have a stronger influence on the learning of academically stronger and, perhaps, more mature
students. Perhaps these students have a more solid
knowledge base upon which to build their clinical skills,
better listening skills, or greater motivation. Older students with nonscience undergraduate majors may have
had broader life experiences that affect their receptiveness
to certain teaching behaviors. The teaching behaviors
that mattered among the students in the lower half of the
class by GPA tended to be those that emphasized time

Volume 110

Effect of Clinical Teaching on Student Performance/Roop and Pangaro

management, leadership style, and involvement of learners, whereas for the upper half of the class, fostering active
learning was more important.
We used an evaluation program that has been rigorously developed and well studied (12,13). Student performance was assessed using a combination of quantified
scores on examinations and descriptive clinical grades
from teachers’ observations as part of a highly structured
program using formal evaluation sessions. We have previously demonstrated the predictive validity of this evaluation system, relating clerkship performance to internship ratings (12). The reliability, or intraclass correlation,
of the clinical evaluations in this study was 0.83, and was
based on input from an average of 10 teachers for each
student. Previous studies have demonstrated that clinical
evaluations based on input from 7 or more teachers has a
reliability of greater than 0.8, which is comparable to
standardized multiple choice examinations and considered suitable for high-stakes decision making (16). Our
study is further strengthened in that it spanned 2 full academic years, was conducted at several teaching hospitals, included 314 students, and had a response rate of
more than 93%.
One limitation of this study is that although the validity and reliability of student ratings of clinical teachers
using the Stanford framework has been demonstrated
(3,4,17), this method does rely on students’ perspectives
of their teachers’ behaviors rather than ratings by an outside observer. During clinical clerkships, the majority of a
student’s grade (72% in our program) is based on clinical
evaluations from teachers. A reciprocal relation between
students’ perceptions of teachers’ behaviors and teachers’
evaluations of students’ performance cannot be ignored—students and teachers who liked (or disliked)
each other may be more likely to provide favorable (or
unfavorable) evaluations. In fact, our study showed that,
on average, students who received higher clinical evaluation scores did rate their teachers’ behaviors more favorably. Teaching during clinical clerkships is characterized
by small group interaction, and it may not be possible to
isolate the ”reciprocal reward“ phenomenon from accurate evaluations. For example, students who enjoy their
teachers may learn more, and teachers whose students are
engaged and enthusiastic may spend more time and use
more effective teaching behaviors. To minimize this effect, we used student reports of specific teaching behaviors rather than their general assessments of overall
teacher skills. More importantly, because the clinical
evaluation for each student consisted of structured input
from an average of 10 clinical teachers, the effect of any
favoritism should be lessened. Another limitation of this
study is that it involved only students at a single medical
school during the medicine clerkship. The findings may
not apply to other clerkships or to postgraduate training
in medicine.

In summary, we measured the effect of perceived clinical teaching behaviors on performance during a medicine clerkship and linked overall teaching behaviors to
medical student performance. This study provides an additional validation of the Stanford Faculty Development
Program framework using student learning as the measured outcome. Among the educational categories, teaching behaviors that reflect leadership style and foster understanding and retention were the most effective.

ACKNOWLEDGMENTS
The authors thank Andrew Shorr, MD, MPH, for review of the
manuscript, and David Cruess, PhD, for statistical review.

REFERENCES
1. Gage NL. Hard Gains in the Soft Sciences: The Case of Pedagogy.
Bloomington, Ind: Phi Delta Kappa; 1985:1.
2. Skeff KM, Stratos GA, Berman J, Bergen MR. Improving clinical
teaching: evaluation of a national dissemination program. Arch Intern Med. 1992;152:1156 –1161.
3. Litzelman DK, Stratos GA, Marriot DJ, Skeff KM. Factorial validation of a widely disseminated educational framework for evaluating
clinical teachers. Acad Med. 1998;73:688 – 695.
4. Marriot DJ, Litzelman DK. Students’ global assessments of clinical
teachers: a reliable and valid measure of teaching effectiveness. Acad
Med. 1998;73(suppl 10):S72–S74.
5. Dewitt TG, Goldberg RL, Roberts KB. Developing community faculty. Principles, practice and evaluation. Am J Dis Child. 1993;147:
49 –53.
6. Albright CL, Farquhar JW, Fortmann SP, et al. Impact of a clinical
preventive medicine curriculum for primary care faculty: results of
a dissemination model. Prev Med. 1992;21:419 – 435.
7. Patridge MI, Harris IB, Petzel RA. Implementation and evaluation
of a faculty development program to improve clinical teaching.
J Med Educ. 1980;55:711–713.
8. Ramsbottom-Lucier MT, Gillmore GM, Irby DM, Ramsey PG.
Evaluation of clinical teaching by general internal medicine faculty
in outpatient and inpatient settings. Acad Med. 1994;69:152–154.
9. Williams BC, Stern DT, Pillsbury MS. Validating a global measure
of faculty teaching performance. Acad Med. 1998;73:614 – 615.
10. Skeff KM, Stratos GA, Mygdal WK, et al. Clinical teaching
improvement: past and future for faculty development. Fam Med.
1997;29:257–257.
11. Lucas CA, Benedek D, Pangaro L. Learning climate and students’
achievement in a medicine clerkship. Acad Med. 1993;68:811– 812.
12. Lavin B, Pangaro L. Internship ratings as a validity outcome measure for an evaluation system to identify inadequate clerkship performance. Acad Med. 1998;73:998 –1002.
13. Pangaro L, Gibson K, Russel W, Lucas C, Marple R. A prospective
randomized trial of a six-week ambulatory medicine rotation. Acad
Med. 1995;70:537–541.
14. Elnicki DM, Ainsworth MA, Magarian GJ, Pangaro LN. Evaluating
the internal medicine clerkship: a CDIM commentary. Am J Med.
1994;97:1– 6.
15. Pangaro L. A new vocabulary and other innovations for improving
descriptive in-training evaluations. Acad Med. 1999;74:1203–1207.
16. Carline JD, Paauw DS, Thiede KW, Ramsey PG. Factors affecting
the reliability of ratings of students’ clinical skills in a medicine
clerkship. J Gen Intern Med. 1992;7:506 –510.
17. Donnelly MB, Woolliscroft JO. Evaluation of clinical instructors by
third-year medical students. Acad Med. 1989;64:159 –164.

February 15, 2001

THE AMERICAN JOURNAL OF MEDICINE威

Volume 110 209