“If you wait to do everything until you're sure it's right, you'll probably never do much of anything.

” (Win Borden)

Kinesiology and Sports Studies Department

Ovande Furtado, M.S.

Purposes of Knowledge Testing
  

  

Assigning a grade (summative evaluation) Measuring progress (formative evaluation) Providing feedback to students or participants Motivating students or participants Documenting effectiveness Assessing instructional effectiveness

Levels of Knowledge
 

Understanding: superficial <> thorough. Bloom’s taxonomy of educational objectives
Knowledge Comprehension Application Analysis Synthesis Evaluation

A good Knowledge Test:
Fair  Covers the course content  Uses appropriate test format  Clear and understandable  Reliable and valid

Types of Knowledge Tests

Essay vs. Objective
 Advantages

and disadvantages

Mastery vs. Discrimination

Mastery Test

If students have mastered the material.
 Information

everyone is expected to know.

Formative evaluation with criterionreferenced standards (e.g., graded passfail). Performance standards tend to be high.
 e.g.,

80% to 90% correct for passing.

Discrimination Test
 

Purpose is to differentiate among students. Items are written to discriminate among different knowledge levels.
 Items

should come from higher levels of Bloom’s taxonomy.

Which one to use?
 

It depends how the scores will be used? Ex. 1: Soccer Rules  Mastery test for formative evaluation
 

Achieves 80% > Go on… Same for an adult program

Knowing basic rules is enough?  Test ability to apply, analyze and synthesize the rules  Discrimination test for formative evaluation

Types of Test Items
Closed-ended (“objective”)
 correct

answer is provided  do not test partial knowledge  may reward guessing

Types include:
 true/false  multiple

choice  matching

Types of Items
Open-ended (“subjective”)
 student

provides correct answer  rewards partial knowledge  easier to construct, more difficult to grade

Types include:
 short

answer  fill in the blank  essay

Item Construction Guidelines

Make you own tests

4 steps
1. 2. 3. 4.

Table of specifications Type of test (essay/objective, true-false/MC) Construct test items Format and administrative details

Table of Specifications

Content objectives:
 history,

values, equipment, etiquette, safety, rules, strategy, techniques of play comprehension, application, analysis, synthesis, evaluation

Educational objectives:
 knowledge,

Planning the Test
    

What to measure? How to measure? When to test? How many questions? What type of test format should be used? What type of questions should be used

Multiple choice items

Avoid negatives, and especially double (and triple) negatives Ensure that stem and foils are all grammatically compatible, similar in length and “parallel” in content Keep to factually or logically supportable correct answers (avoid opinion)

Multiple choice items

Avoid “irrelevant cues”
– e.g., always, never, common terms between stem and correct answer

Common Errors
  

 

All distractors are not plausible. Item is ambiguous. Wording the correct response more precisely than the distractors. Specific determiners. Irrelevant clues.
 Grammatical


Question 1: In golf, a shot made in two strokes under par is an:

birdie Bogey double bogey eagle

Question 1: In golf, a shot made in two strokes under par is an:
a) b) c) d)

birdie Bogey double bogey eagle*

Question 1: In golf, what term is used for a shot made in two strokes under par?

birdie Bogey double bogey eagle

Essay questions
 

Uses and advantages Limitations
 

Inability to obtain a wide sample Inconsistencies in scoring procedures (the matter of objectivity) Difficulties in analyzing test effectiveness

 

Recommendations for construction Recommendations for scoring

What is next?
  

We know how to plan and administer Need to master item analysis What it item analysis
 Verify

how items behave  Refine your testing

Item Analysis

Item Difficulty

Percentage of testers who answered the item correctly Discriminate among test-taker
 

Item Discrimination

Answered correctly by more knowledgeable students. Missed by less knowledgeable students.

How well the item "functions" in the test

Serves Several Purposes
    

Find flaws in the test Two right answers Too difficult (re-teach concept) Too easy (no need for further teaching) Find wrong answers=identify misconceptions

How is it done?
• • • •

Identify upper and lower groups. Upper group is top 27% of scores. Lower group is bottom 27% of scores. The upper and lower groups must have  same number of scores.

Item Difficulty (D or p)

Percentage of students that chooses correct  answer. D is high when the item is easy. – if 90% got item correct, D = .90 – if 30% got item correct D = .30

Item Difficulty (D or p)
D = (# correct in upper + # correct in lower) (# in upper + # in lower)

31 students:  8 in upper group, 8 in lower group    Item A B C* D E 1  Upper II I IIIII Lower I III II I I D = (5 + 2) ÷ (8 + 8) = .4375

Item Discrimination
• •

• •

How well item discriminates between those who  did well and those who did poorly on test. Correlation between scores on one item and scores  on total test. Range:  ­1.0 to +1.0 Try guessing…

Discrimination Index ( r )

Hard items

Easy items

Have difficulties less than 0.35 Have difficulties above 0.85

Too easy or too difficulty = won't contribute to  test’s reliability

Discrimination Index ( r )
r = (# correct in upper ­ # correct in lower)                            (# in one group)

Discrimination Index ( r )
31 students:  8 in upper group, 8 in lower group    Item A B C* D E 1  Upper II I IIIII Lower I III II I I r = (5 ­ 2) ÷ 8 = .375

Discrimination Index ( r )
 > .40   .20 to .39   .00 to .19   Negative Very Good (Excellent) Acceptable (Good) Revise or Discard Poor (Revise or Discard)

31 students:  8 in upper group, 8 in lower group    Item A B C D* 2 Upper           I          IIII Lower II II     IIII D = (4 + 4) ÷ (8 + 8) = .50 r = (4 ­ 4) ÷ 8 = 0.00 E III

31 students:  8 in upper group, 8 in lower group    Item A* B C D 3 Upper  II I IIII I LowerIIIII I I I D = (2 + 5) ÷ (8 + 8) = .4375 r = (2 ­ 5) ÷ 8 = ­.375 E

Are D and r related ?

Item difficulties affect the maximum  attainable discrimination index. Highest r  (1.0) is possible only when   D =  .50. As D goes up or down from .50, maximum  possible r decreases.

Black, P., Harrison, C., Lee, C., Marshall, B., & Wiliam, D. (2003) Assessment for Learning: Putting it into practice. Berkshire, England: Open University Press. Butler, D.L. & Winnie, P.H. (1995) Feedback and self-regulated learning: a theoretical synthesis. Review of Educational Research, 65(3), 245-281. Sadler, D.R. (1998) Formative assessment: revisiting the territory. Assessment in Education, 5(1), 77-84. http://www.uleth.ca/edu/runte/tests/iteman/purpia/purpia.html