You are on page 1of 45

Introduction to Test

Development
Graham McMahon, MD, MMSc.
Sarah E. Peyre, EdD
Educational Research Methods Program
Learning Objectives
 Understand the pros and cons to various testing questions
for written examinations
 Learn how to determine
 Item difficulty and

 Item discrimination

 Understand the psychometrics of a high stakes test


 Validity

 Reliability

 Standard Setting
Come to our Workshop!
 Work in small groups to…
 Review problematic multiple choice items
 Establish validity and reliability for a test

 Participate in standard setting exercise


Question Types – Pros and Cons
 Essay Items
 Short Answer and Completion Items
 Matching Items
 True-False and Multiple-Choice Tests
 Interviews
 Portfolios

….all can be scored and can be subject to test development


Multiple-Choice Items
 An 85-year-old woman has difficulty
raising her arms above her head and
 Stem
combing her hair. She has morning
aches in her shoulders and neck. Her
reflexes are symmetrical and normal.
There is no muscle tenderness or joint
swelling. Which one of following
laboratory tests should be obtained to  Lead in
confirm the most likely diagnosis?



A. Anti-nuclear antibody.
B. Erythrocyte sedimentation rate.
 Responses
 C. Serum concentration of creatine  Correct response
kinase.
 D. Serum concentration of  Distractors
angiotensin-converting enzyme.
 E. Urine microscopy.
Tips for writing discriminant MCQs
 Be sure that each item reflects a clearly defined learning
outcome
 Stem
 The stem of the item should be self-contained and written in clear and
precise language.
 Avoid ‘trigger’ words (e.g. pin-rolling tremor)
 Negatives, excepts, absolutes and qualifiers in question stems are no-
no’s.
 Responses
 All answers should be plausible and homogenous
 Items need to be independent of one another
 Answer choices should be similar in length and grammatical form
 List answer choices in alphabetical or numerical order
 Avoid ‘all of the above’ as a response
 Avoid technical flaws (tense or plurality for example)
Pros and Cons of MCQ’s
Pros Cons
 Useful for measuring  Good questions
learning outcomes at  Take a long time to write
almost any level  Are difficult to write
 Easy to understand
 Easy to score
 Constrain creative
responses from learners
 Easily analyzed for
effectiveness
 May have more than
 Allow broad coverage
one correct answer
efficiently
Item Analysis
 Qualitative: looks at whether the content
matches the information, attitude,
characteristic or behavior being assessed
 Quantitative:
 Item difficulty
 Item discrimination
Determining item difficulty
Number of Students achieving each Score
 The percentage of
participants who 30

get that item 20

correct 10

 Item difficulty 0
0 10 20 30 40 50 60 70 80 90 100

scores can range Hard Exam Normal Exam Easy Exam

from 0 to 100%
 Low value = high High Medium Low
difficulty (Difficult) (Moderate) (Easy)
 High value = low >30% AND >=80
difficulty <= 30%
< 80% %
0 10 20 30 40 50 60 70 80 90 100
Discrimination Index
 The Discrimination Index distinguishes for each item
between the performance of students who did well on the
exam and students who did poorly.
 Index of discrimination:
 The difference in the % of
people in one extreme group
minus the % of people in the
other extreme group
 Item discrimination scores can
range from -1.00 to +1.00
 Example Item Item Difficulty
 100 test takers: 20 in top 25 Discrimination High Med Low
(D)
were correct but only 5 in the
lowest 25 students were correct.
D =< 0% revie revie revie
 DI = (20-5)/25 = 0.8
w w w
0% < D < 30% ok revie ok
Item Analysis Report
Order ID and group number

percentages counts

 The left half shows percentages, the right half counts.


 The correct option is indicated in parentheses.
 Point Biserial is similar to the discrimination index, but is not based on fixed upper and
lower groups. For each item, it compares the mean score of students who chose the
correct answer to the mean score of students who chose the wrong answer.
Test Validity
 Validity:
 The extent to which inferences made from a test are
appropriate, meaningful, or useful.
 Does my test measure what it is intended to measure?
 Content validity
 Expert review
 Criterion validity – Predictive/Concurrent
 Scores can be related to another known metric
 Construct validity
 Successfully differentiates between levels of learners
Kissing Cousins
 A test can not be valid until it is reliable:
Test Reliability
 Reliability: Measure the underlying construct
consistently = trustworthiness/stability
 Test-Retest Reliability
 Alternate forms reliability

 Internal consistency reliability (cronbach’s alpha)

 Inter-rater reliability
How do I set a passing grade?
 Standard Setting
 Norm referenced: Z-scores
 Number of standard deviations below the mean
 Criterion Referenced: Angoff Method
 Panel of experts are asked to evaluate each item and
estimate the number fraction of minimally competent
students who would answer each item correctly
 Ratings are averaged across the experts for each item,
discussed and then summed to get panel raw cutscore
Thank you!
Welcome to Our
Workshop on Test
Development!

Graham McMahon, MD, MMSc.


Sarah E. Peyre, EdD
Educational Research Methods
The Academy at Harvard Medical School
Outline
 Learning Objectives
 Creating MCQ Items
 Item Template
 Item Flaws

 Tips for Success

 Establishing
Validity and Reliability for a Test
 Mock Standard Setting
Item Creation
 Consider beginning with the end in mind
 What is it that you think the medical student should
demonstrate that he/she knows or knows how to do?
 This should be an objective from your lesson plan.

Learning Activities

Objectives Evaluation
Item Stems: Clinical Vignettes
 Things to consider:
 Patient description (46-year-old-female)
 Functional disability (difficulty rising from a seated
position, but has no difficulty flexing her legs)
 The question based on this item template:
 A 46-year-old-female has difficulty rising from a seated
position, but has no difficulty flexing her legs. Which of the
following muscles has been injured?
[Objective: Identify and explain the function of the muscles in
the…. ]
Item Creation
 Lead-in: The most likely  Lead-in: The most likely
diagnosis is cause is
 Options: disorders, diseases  Options: bacteria, toxins,
 Objective: Describe the signs medications, metabolic defects
and symptoms of X. Compare  Objective: List and explain the
and contrast the signs and causes of X.
symptoms of XY and Z.  Lead-in: The most likely
 Lead-in: Which of the mechanism is
following additional  Options: disease mechanisms,
symptoms would you expect pharmacologic mechanisms
to be present?  Objective: Diagram and
explain the mechanism of drug
 Options: symptoms X.
 Objective: same as above
Item Templates
 Other considerations:
 Age, gender, race, ethnicity
 Site of care (ER, office visit)

 Presenting complaint
 presents for a routine physical exam
 presents with a headache
 Duration
 Patient history, family history
 There is no history of…
 He has a history of…
 Physical findings
 Lab values, imaging studies, pathology reports
 Treatment, subsequent findings
Item Creation
 Add the lead-in (question) and the options
 Which of the following pulmonary variables is most
likely to be lower than normal in this patient?
A. Alveolar-arterial PO2 difference
B. Compliance of the lung
C. Oncotic pressure of the alveolar fluid
D. Work of breathing
E. Residual volume
Item Creation: Taking Recall up
to Another Level
 Recallquestion:
What area is supplied with blood by the posterior
inferior cerebral artery?

[Objective: Identify the areas of the brain


supplied by the major cerebral arteries.]
Item Creation: Taking Recall up
to Another Level
Application question:
A 62-year-old man develops left-sided limb ataxia,
Horner’s syndrome, nystagmus and loss of facial pain
and temperature. Which artery is most likely to be
occluded?

[Objective: Differentiate the signs and symptoms that


would occur upon occlusion of each of the major
cerebral arteries.]
Your Turn!
Review the distributed questions
and identify strengths and
weaknesses in each.
Question
 Acute intermittent porphyria is the result of a
defect in the biosynthetic pathway for
 A. collagen
 B. corticosteroid
 C. fatty acid
 D. glucose
 E. heme
Rewritten….
 An otherwise healthy 33-year-old male has mild weakness and
occasional episodes of steady, severe abdominal pain with
some cramping but no diarrhea. One aunt and a cousin have
had similar episodes. During an episode, his abdomen is
distended, and bowel sounds are decreased. Neurological
examination shows mild weakness in the upper arms. These
findings suggest a defect in the biosynthetic pathway for:
 A. collagen
 B. corticosteroid
 C. fatty acid
 D. glucose
 E. heme
Question
A 52-year-old male presents to the office with a one-week history
of flank pain and hematuria. Past medical history is
unremarkable. Physical examination reveals a left-sided
abdominal mass. The greatest risk factor for renal cell
carcinoma is
A. diabetes
B. female gender
C. hyperlipidemia
D. low body mass index
E. smoking
Question
Which of the following is a correct statement about
cystic fibrosis (CF)?
A. The incidence of CF is 1:2000.
B. Children with CF usually die in their teens.
C. Males with CF are sterile.
D. CF is an autosomal recessive disease.
E. Symptoms of CF only appear in infancy.

What other flaws can you detect in this question?


Item Flaws: Unfocused items
Which of the following is correct regarding [topic]?

There is not enough information in the stem to answer


the question without looking at the options.
The responses are disparate. The distractors have to be
100% false. Thus, the question basically becomes a
true/false question. Avoid these!
A 45-year-old man comes to the physician because of a 6 week history
of a non-productive cough. An X-ray film of the chest shows a 0.8 cm
well circumscribed peripheral nodule in the right lung. Biopsy shows a
necrotizing granuloma. Which of the following is the most likely
diagnosis?

(A) Pulmonary embolus


(B) Small cell carcinoma
(C) Pseudomonas aeruginosa infection
(D) Histoplasma capsulatum
(E) Herpes pneumonitis
(F) Metastatic renal cell carcinoma
A healthy 57-year-old woman comes to the physician
because of 2 cm mass in her right breast. Biopsy
reveals an invasive ductal carcinoma. Which of the
following is the most important prognostic factor?

(A) High grade tumor cytology


(B) Infiltrative nature of tumor into benign breast
(C) Numerous mitotic figures
(D) Amount of tumor fibrosis
(E) Presence of Lymph node metastasis
(F) Number of plasma cells in tumor
A 63-year-old man comes to the physician because of a 6-week history
of progressive dyspnea on exertion, orthopnea, and ankle edema. He
has received multiagent chemotherapy for Waldenström’s
macroglobulinemia for the past year. Urinalysis shows proteinuria. A
bone marrow biopsy shows a partial response to therapy with ongoing
marrow involvement still identified. Which of the following is the most
likely diagnosis?

(A) Cardiac amyloidosis


(B) Viral myocarditis
(C) Cardiac sarcoidosis
(D) Myocardial infarct
(E) Hypertrophic cardiomyopathy
A question submitted
In aortic stenosis what other abnormal heart
sounds might accompany the resulting
murmur?
A. Physiological splitting of S2
B. An accentuated  S2
C. Paradoxical splitting of S2
D. A muffled S2
Revised question
A 60 year old patient with an active lifestyle is found to
have a systolic murmur on a routine physical exam.
He currently has no symptoms. If this were aortic
stenosis, what other abnormal heart sounds might
accompany the systolic murmur?
A.) Physiological splitting of S2
B.) An accentuated S2
C.) Paradoxical splitting of S2
D.) A muffled S2
Determining item difficulty
Number of Students achieving each Score
 The percentage of
participants who 30

get that item 20

correct 10

 Item difficulty 0
0 10 20 30 40 50 60 70 80 90 100

scores can range Hard Exam Normal Exam Easy Exam

from 0 to 100%
 Low value = high High Medium Low
difficulty (Difficult) (Moderate) (Easy)
 High value = low >30% AND >=80
difficulty <= 30%
< 80% %
0 10 20 30 40 50 60 70 80 90 100
Discrimination Index
 The Discrimination Index distinguishes for each item
between the performance of students who did well on the
exam and students who did poorly.
 Index of discrimination:
 The difference in the % of
people in one extreme group
minus the % of people in the
other extreme group
 Item discrimination scores can
range from -1.00 to +1.00
 Example Item Item Difficulty
 100 test takers: 20 in top 25 Discrimination High Med Low
(D)
were correct but only 5 in the
lowest 25 students were correct.
D =< 0% revie revie revie
 DI = (20-5)/25 = 0.8
w w w
0% < D < 30% ok revie ok
Item Analysis Report
Order ID and group number

percentages counts

 The left half shows percentages, the right half counts.


 The correct option is indicated in parentheses.
 Point Biserial is similar to the discrimination index, but is not based on fixed upper and
lower groups. For each item, it compares the mean score of students who chose the
correct answer to the mean score of students who chose the wrong answer.
Summary
 Utilize action verbs to write objectives
 Write your exam items based on the objectives
 Tie the clinical vignette to the lead-in
 Choose appropriate options with one best answer

 Avoid technical flaws

 Utilize an item checklist to ensure that you have done


all you can to write the best items possible.
 Pretest your items
Establishing Validity and
Reliability
(Groups)
Standard Setting

(Groups)
Graham McMahon
gmcmahon@partners.org

43
Item Discrimination: Examples
Item Number of Correct Answers in Item Discrimination
No. Group Index
Upper 1/4 Lower 1/4
0.7

1 90 20 0.1
2 80 70 1
3 100 0 0
4 100 100
0
5 50 50
-0.4
6 20 60
Number of students per group = 100
Distracter Analysis: Examples

Item 1 A* B C D E Omit
% of students in upper ¼ 20 5 0 0 0 0
% of students in the middle 15 10 10 10 5 0
% of students in lower ¼ 5 5 5 10 0 0

Item 2 A B C D* E Omit
% of students in upper ¼ 0 5 5 15 0 0
% of students in the middle 0 10 15 5 20 0
% of students in lower ¼ 0 5 10 0 10 0

(*) marks the correct answer.

You might also like