Professional Documents
Culture Documents
a. Planning Phase – the purpose of the test is identified, the learning outcomes to
be assessed are specified and a table of specifications is prepared for guidance in
the item construction phase.
b. Item Construction Phase – test items are constructed following the
appropriate format for the specified learning outcomes of instruction
c. Review Phase – the items are examined by the teacher or peers, prior to
administration based on the judgment of their alignment to content and behavior
components of the instructional competencies. And after the administration,
based on an analysis of students’ performance in each item.
A.) Identifying Purpose of Test
Seeks to uncover what students know and can do to get feedback on what they need
to alter or work on further to improve their learning.
Teachers can use the results to map out their pedagogical moves to improve
teaching-learning for appropriate mentoring.
Alternatives or distracters in selected-response items can be in terms of popular
falsehood, misconceptions, misinterpretations or inadequately stated principles
students may likely adapt. (ex: multiple choice items)
The teacher carefully selects the relevant content or skills to be covered in a test.
Given at the end of the instructional process, a test takes a different purpose.
Test Blueprint is a list of key components defining your test, including the
purpose of the test and the content framework.
Regardless of what subject area, a classroom test cover the learning outcomes
intended and essential to be achieved within the unit or period of work.
This planning phase help teachers make genuine connection in the trilogy among
Curriculum, Instruction, and Assessment, known as the ALIGNMENT
TRIANGLE.
Key Points:
Test Blueprint or Table of Specification gives evidence to the content validity.
Common TOS can show the breakdown of learning outcomes by content area
and cognitive process or behavior involved.
Miller, Linn, and ground (2009) developed a table of specification that breaks
down learning outcome that covers a wider cognitive domain. This table of
specification is useful to detect the areas the learner have gained
mastery in, and those with need additional instructional attention.
CHAPTER 8: SELECTING AND CONSTRUCTING TEST
ITEMS AND TASKS
Content Objectives
• Understand how to prepare and use an achievement test Understand
when to use an achievement test
Selecting and Constructing Test Items and Task
• This chapter deals with familiarized you with the initial process in
developing classroom test.
• Test types that are appropriate to gauge the learning outcomes
proposed by the curriculum standards, how to select them and how to
construct them.
Test Types has 3 major components: (a). Supply Type, (b).Selection Type, (c).
Performance Type
Supply type test items are those which can be answered by a word, phrase, number
or symbol.
Supply type test items can be further divided into short answer test items and
completion test items.
Selection Type
Selection items (or selected response items) are test items on which the examinee
selects one of a set of choices, rather than generating an original response.
A conventional, multiple-choice item typically features a stimulus, an item stem or
question, a correct answer choice, and a set of incorrect choices.
Performance Type
Performance Type
Performance testing is type of testing perform to determine the performance of
system to major the measure, validate or verify quality attributes of the system
like responsiveness, Speed, Scalability, Stability under variety of load
conditions.
Load testing is one of the simplest forms of performance testing and is usually
conducted to understand the behaviour of the system under a specific expected
load.
Application Application
Is able to explain the rising prices of Is able to determine the number of 1 x 1
vegetables during summer time. tiles needed to cover a 50 ft x 100 ft half.
Level 5: Evaluating
Level 3: Applying -Critique
-Solve -Assess
-Apply -Defend
-Modify -Justify
-Demonstrate -Appraise
-Employ -Measure
-Calculate
-Generate Level 6: Creating
-Plan
-Generate
-Produce
-Design
-Construct
-Compose
Alignment of Learning Outcome to Test Types
Knowledge Simple Understanding Deep Understanding
Level 5: Evaluating
Level 3: Applying -Critique
-Solve -Assess
-Apply -Defend
-Modify -Justify
-Demonstrate -Appraise
-Employ -Measure
-Calculate
-Generate Level 6: Creating
-Plan
-Generate
-Produce
-Design
-Construct
-Compose
EXAMPLE QUESTIONS:
A. A part of speech that is used to connect clauses or sentences or to
coordinate words in the same clause is called _________.
-The expected response is CONJUNCTION.
B. The novel, El Filibusterismo was written by Dr. Jose Rizal during
_________ colonial period.
-The expected response is SPANISH.
*Gap Filling is another term for this variant as the students fills several gaps in a
discourse depending on the target outcome.
- Guidelines in the construction of Completion Items (Kubiszyn and Borich,
2010; McMillan, 2007; Nitko, 2001; Popham, 2011)
ESSAY
What is an Essay?
Essay (n.) | \ ‘e-,sā
analytic or interpretative literary composition usually dealing with its subject
from a limited or personal point of view
CHARACTERISTICS OF AN ESSAY
- Belongs to supply category because students are required to have a fully
constructed response on their own
- Free-structured: students can organize their response freely and can even
manifest their own creative writing style
- Good for testing deep understanding and reasoning
Restricted-response
- Suggests specification
- Limited coverage of content
- Specified length of response
- Expected form of response
- Definite perspective/mindset to be used
Extended-response
- Does not suggest any form of restriction
- Free to organize & expound ideas
SUGGESTIONS FOR CONSTRUCTING ESSAY QUESTIONS
- by Miller, Linn & Gronlund (2009, p. 243)
1. Restrict the use of essay questions to those learning outcomes that cannot be
measured satisfactorily by objective items
2. Construct questions that will call forth the skills specified in the learning
standards
3. Phrase the question so that the student’s task is clearly defined
4. Indicate approximate time limit for each question
5. Avoid the use of optional questions
RUBRICS/ANALYTIC SCORING
- Helps the teacher to score and can significantly reduce subjectivity and can more
or less help in objectifying scoring of a non-objective type of item
Example:
Criterion Exemplary Very Satisfactory Moderately Fairly Needs
(6 pts) Satisfactory (4 pts) Satisfactory Satisfactory Improvement
(5 pts) (3 pts) (2 pts) (1 pt)
Organization
Clarity of
Message
Creativity
Total
Overall
Rating
2. Multiple-Choice Items
Another selected-response item format is the multiple-choice. The wide choice for this
format in classroom testing is mainly due to its versatility to assess various levels of
understanding from knowledge and simple understanding to deep understanding.
(Table)
Writing good multiple-choice items requires clarity in stating the problem in the stem and
the plausibility or attractiveness of the distracters.
Stem
1. All the words of the stem should be relevant to the task.
2. Stem should be meaningful by itself and fully contain the problem.
3. The stem should always use a question with only one correct or clearly best
answer.
Distracters
1. All distracters should appear plausible to uniformed test takers.
2. Randomly assign correct answers to alternative position.
3. Avoid using "All-of-the-above" or "None-of-the-above" as distracters.
3. Matching Types
Of the three general selected-response item formats, matching items appears differently.
It consists of two parallel lists of words or phrases the students are tasked to pair.
(Table)
Guidelines in constructing matching items
1. Keep the list premises and the list of options homogenous or belonging to a category
2. Keep the premises always in the first column and the options in the second column.
3. Keep the lists in the two columns unequal in number.
4. Test distracters always describe the basis for matching.
5. Keep the number of premises not more than eight (8) items.
6. Ambiguous lists should be avoided.
- This chapter deals with providing practical and necessary ways for improving
teacher-developed assessment tools. Popham (2011) suggest two approaches
to undertake item improvement: the judgmental approach and empirical
approach
- This approach basically makes use of human judgment in reviewing the items.
- There are three sources of test-improvement judgments
- Teachers, Peers, and Students
- There are five suggestions given by Popham (2011) for the teachers to follow
in exercising judgment.
- When reviewing your own items, be sure to be familiar with general item-
writing guidelines. Use them to find and fix any violations of item writing
principles.
- These guidelines should be used by the teachers to check how good the items
have been planned and written particularly in their alignment to intended
instructional outcomes
5. Fairness
- Assessment items should be free of bias. They should not favor one group nor
discriminate against another. Be attentive to any potential bias so that you can
eliminate them as much as you possibly can.
Peer review
- There are schools that encourage peer or collegial review of assessment
instruments among themselves. Time is provided for this activity and it has
almost always yielded good results for improving tests and performance-
based assessment tasks.
Student Review
- Engagement of students in reviewing the items has become a laudable
practice for improving classroom tests. The judgment is based on the
students’ experience in taking the test, their impressions, and reactions
during the test event.
B. Empirically Based-Procedures
Item improvement using empiricially based methods is aimed at improving the quality
of an item using students’ responses to the test.
- An item is considered good when its quality indices i.e difficulty index and
discrimination index, meet certain characteristics
Difficulty Index
- used in the context of criterion-referenced interpretation or testing for
mastery.
- For obtaining nominal data like frequency, percentage and proportion. Useful
data then are in the form of:
a. Total number of students answering the item (T)
b. Total number of students answering the item right (R)
An item’s difficulty index is obtained by calculating p value (p) which is the proportion
of students answering the item correctly.
P= R/T
Where p is the difficulty index
R=Total number of students answering the item right
T= Total number of students answering the item
i.e
Item 1: There were 45 students in the Item 1 has p value of 0.67. Sixty seven
class who responded to item 1 and 30 (67%) got the item right while 33%
answered it correctly. missed it.
P= 30/45
=0.67
Item 2: In the same class, only 10 Item 2 has p value of 0.22. Out of 45 only
responded in item 2. 10 or 22% got the item right while 35 ot
P= 10/45 78% missed it.
=.22
FOR NORMATIVE TEST: Between the two items, item 2 appears much more difficult
item since less than a fourth of the class only was able to responded correctly.
FOR CRITERION -REFERENCED TEST: The class shows much better performance in
item 1 in item 2. It is still a long wat to master item 2
- The p-value from 0.0 to 1.0 which indicates from extremely difficult as no one got it
correctly to extremely very easy as every got it correct
- For binary-choice items, there’s a 50% probability of getting the item correct simply
by chance
- For multiple choice items of four alternatives the chance of obtaining a correct answer
by guessing is only 25%.
Discrimination Index
- The power of an item to discriminate between the informed and uninformed
groups or between more knowledgeable and less knowledgeable
- This is an item statistics that can reveal useful information for improving an
item
Another calculation can bring the same results as (Kubisyn and Borich 2010):
Where Ru = number of upper group getting the item correct
RI = number of lower group getting the item correct\
T= number of either group
e. Get the discrimination index by getting the difference between the p values.
Example given: Obtain the index of discrimination of an item if the upper 25%
of the class had a difficulty of 0.60
While the lower 25% of the class had a difficulty index of 0.20
DU= 0.60 while DL= 0.20, thus index of discrimination = .60 - .20 = .40
Distacter Analysis
- Used to discover for item-improvement utilizes an analysis of the distribution
of responses across the distracters. Especially when the difficulty index and
discrimination index seem to suggest it’s being candidate for revision.
Example given:
Sensitivity to Instruction Index
Si will indicate if p- value obtained for the item in the post test is greater than the pre-
test.
Example given:
Consider an item where in a class of 40, 80% answered it correctly in the post while only
10% did in the pre-test.
It’s p-value for the post test is .80 while for pre-test .10 thus Si = .70 following this
calculation:
Sensitivity to Instruction (Si)= P(post) – P(pre)
= .80 - .10
= .70