You are on page 1of 36

“If you wait to do everything until

you're sure it's right, you'll probably


never do much of anything.” (Win
Borden)

PED4470
TESTING KNOWLEDGE IN PE
Ovande Furtado, M.S.
Kinesiology and Sports Studies
Department
Purposes of Knowledge
Testing
 Assigning a grade (summative evaluation)
 Measuring progress (formative evaluation)
 Providing feedback to students or
participants
 Motivating students or participants
 Documenting effectiveness
 Assessing instructional effectiveness
Levels of Knowledge
 Understanding: superficial <> thorough.
 Bloom’s taxonomy of educational objectives
Knowledge
Comprehension
Application
Analysis
Synthesis
Evaluation
A good Knowledge Test:
 Fair
 Covers the course content

 Uses appropriate test format

 Clear and understandable

 Reliable and valid


Types of Knowledge Tests
 Essay vs. Objective
 Advantages and disadvantages
 Mastery vs. Discrimination
Mastery Test
 If students have mastered the material.
 Information everyone is expected to know.
 Formative evaluation with criterion-
referenced standards (e.g., graded pass-
fail).
 Performance standards tend to be high.
 e.g., 80% to 90% correct for passing.
Discrimination Test
 Purpose is to differentiate among
students.
 Items are written to discriminate among
different knowledge levels.
 Items
should come from higher levels of
Bloom’s taxonomy.
Which one to use?
 It depends how the scores will be used?
 Ex. 1: Soccer Rules
 Mastery test for formative evaluation
 Achieves 80% > Go on…
 Same for an adult program
 Knowing basic rules is enough?
 Test ability to apply, analyze and synthesize the rules
 Discrimination test for formative evaluation
Types of Test Items
Closed-ended (“objective”)
 correct answer is provided
 do not test partial knowledge
 may reward guessing

Types include:
 true/false
 multiplechoice
 matching
Types of Items
Open-ended (“subjective”)
 student provides correct answer
 rewards partial knowledge
 easier to construct, more difficult to grade

Types include:
 short answer
 fill in the blank
 essay
Item Construction Guidelines
 Make you own tests
 4 steps
1. Table of specifications
2. Type of test (essay/objective, true-false/MC)
3. Construct test items
4. Format and administrative details
Table of Specifications
 Content objectives:
 history,values, equipment, etiquette,
safety, rules, strategy, techniques of
play
 Educational objectives:
 knowledge, comprehension,
application, analysis, synthesis,
evaluation
Planning the Test
 What to measure?
 How to measure?
 When to test?
 How many questions?
 What type of test format should be
used?
 What type of questions should be used
Multiple choice items
 Avoid negatives, and especially double
(and triple) negatives
 Ensure that stem and foils are all
grammatically compatible, similar in
length and “parallel” in content
 Keep to factually or logically supportable
correct answers (avoid opinion)
Multiple choice items
 Avoid “irrelevant cues”
– e.g., always, never, common terms
between stem and correct answer
Common Errors
 All distractors are not plausible.
 Item is ambiguous.
 Wording the correct response more
precisely than the distractors.
 Specific determiners.
 Irrelevant clues.
 Grammatical clues
Question 1: In golf, a shot made in two strokes
under par is an:

birdie
Bogey
double bogey
eagle
Question 1: In golf, a shot made in two strokes under par
is an:

a) birdie
b) Bogey
c) double bogey
d) eagle*
Question 1: In golf, what term is used for a shot made in
two strokes under par?

birdie
Bogey
double bogey
eagle
Essay questions
 Uses and advantages
 Limitations
 Inability to obtain a wide sample
 Inconsistencies in scoring procedures (the matter
of objectivity)
 Difficulties in analyzing test effectiveness
 Recommendations for construction
 Recommendations for scoring
What is next?
 We know how to plan and administer
 Need to master item analysis
 What it item analysis
 Verifyhow items behave
 Refine your testing
Item Analysis
 Item Difficulty
 Percentage of testers who answered the item
correctly
 Item Discrimination
 Discriminate among test-taker
 Answered correctly by more knowledgeable
students.
 Missed by less knowledgeable students.
 How well the item "functions" in the test
Serves Several Purposes
 Find flaws in the test
 Two right answers
 Too difficult (re-teach concept)
 Too easy (no need for further teaching)
 Find wrong answers=identify
misconceptions
How is it done?
• Identify upper and lower groups.
• Upper group is top 27% of scores.
• Lower group is bottom 27% of scores.
• The upper and lower groups must have 
same number of scores.
Item Difficulty (D or p)
• Percentage of students that chooses correct 
answer.
• D is high when the item is easy.
– if 90% got item correct, D = .90

– if 30% got item correct D = .30
Item Difficulty (D or p)

D = (# correct in upper + # correct in lower)
(# in upper + # in lower)
Example
31 students:  8 in upper group, 8 in lower group
   Item A B C* D E
1  Upper II I IIIII
Lower I III II I I
D = (5 + 2) ÷ (8 + 8) = .4375
Item Discrimination
• How well item discriminates between those who 
did well and those who did poorly on test.
• Correlation between scores on one item and scores 
on total test.
• Range:  ­1.0 to +1.0
• Try guessing…
Discrimination Index ( r )
• Hard items
• Have difficulties less than 0.35
• Easy items
• Have difficulties above 0.85

• Too easy or too difficulty = won't contribute to 
test’s reliability
Discrimination Index ( r )
r = (# correct in upper ­ # correct in lower)
                           (# in one group)
Discrimination Index ( r )
31 students:  8 in upper group, 8 in lower group
   Item A B C* D E
1  Upper II I IIIII
Lower I III II I I
r = (5 ­ 2) ÷ 8 = .375
Discrimination Index ( r )
 > .40 Very Good (Excellent)

  .20 to .39 Acceptable (Good)

  .00 to .19 Revise or Discard

  Negative Poor (Revise or Discard)
Examples
31 students:  8 in upper group, 8 in lower group
   Item A B C D* E
2 Upper           I          IIII III
Lower II II     IIII
D = (4 + 4) ÷ (8 + 8) = .50
r = (4 ­ 4) ÷ 8 = 0.00
Examples
31 students:  8 in upper group, 8 in lower group
   Item A* B C D E
3 Upper  II I IIII I
LowerIIIII I I I
D = (2 + 5) ÷ (8 + 8) = .4375
r = (2 ­ 5) ÷ 8 = ­.375
Are D and r related ?
• Item difficulties affect the maximum 
attainable discrimination index.
• Highest r  (1.0) is possible only when   D = 
.50.
• As D goes up or down from .50, maximum 
possible r decreases.
References
Black, P., Harrison, C., Lee, C., Marshall, B., & Wiliam, D. (2003) Assessment for Learning: Putting it into
practice. Berkshire, England: Open University Press.
Butler, D.L. & Winnie, P.H. (1995) Feedback and self-regulated learning: a theoretical synthesis. Review
of Educational Research, 65(3), 245-281.
Sadler, D.R. (1998) Formative assessment: revisiting the territory. Assessment in Education, 5(1), 77-84.
http://www.uleth.ca/edu/runte/tests/iteman/purpia/purpia.html