You are on page 1of 25

COURSE

PLANNING A WRITTEN TASK


Why do you need to define the test objectives or learning outcomes targeted for assessment?
In designing a well-planned written test, first and foremost, you should be able to identify
the intended learning outcomes in a course, where a written test is an appropriate method to use.
These learning outcomes are knowledge, skills, attitudes, and values that every student should
develop throughout the course. Clear articulation of learning outcomes is a primary consideration
in lesson planning because it serves as the basis for evaluating the effectiveness of the teaching
and learning process determined through testing or assessment. Learning objectives or outcomes
are measurable statements that articulate, at the beginning of the course, what students should
know and be able to do or value as a result of taking the course. These learning goals provide the
rationale for the curriculum and instruction. They provide teachers the focus and direction on
how the course is to be handled, particularly in terms of course content, instruction, and
assessment. On the other hand, they provide the students with the reasons and motivation to
study and persevere. They give students the opportunities to be aware of what they need to do to
be successful in the course, take control and ownership of their progress, and focus on what they
should be learning. Setting objectives for assessment is the process of establishing direction to
guide both the teacher in teaching and the student in learning.

What are the objectives for testing?


In developing a written test, the cognitive behaviors of learning outcomes are usually
targeted. For the cognitive domain, it is important to identify the levels of behavior expected
from the students. Traditionally, Bloom's Taxonomy was used to classify learning objectives
based on levels of complexity and specificity of the cognitive behaviors. With knowledge at the
base (i.e., lower order thinking skill), the categories progress to comprehension, application,
analysis, synthesis, and evaluation. However, Anderson and Krathwohl, Bloom's student and
research partner, respectively, came up with a revised taxonomy. in which the nouns used to
represent the levels of cognitive behavior were replaced by verbs, and the synthesis and
evaluation were switched. (Figure 4.1 presents the two taxonomies.)

Assessment in Learning
COURSE

Figure 2.1 Taxonomies of Instructional Objectives


In developing the cognitive domain of instructional objectives, key verbs can be used.

What is a table of specifications?


A table of specification (TOS), sometimes called a test blueprint, is a tool used by
teachers to design a test. It is a table that maps out the test objectives, contents, or topics covered
by the test; the levels of cognitive behavior to be measured; the distribution of items, number,
placement, and weights of test items; and the test format. It helps ensure that the course intended
learning outcomes, assessments, and instruction are aligned.
Generally, the TOS is prepared before a test is created. However, it is ideal to prepare one
even before the start of instruction. Teachers need to create a TOS for every test that they intend
develop. The test TOS is important because it does the following:
 Ensures that the instructional objectives and what the test captures match
 Ensures that the test developer will not overlook details that are considered essential to a
good test
 Makes developing a test easier and more efficient
 Ensures that the test will sample all important content areas and processes
 Is useful in planning and organizing
 Offers an opportunity for teachers and students to clarify achievement expectations

Assessment in Learning
COURSE

What are the general steps in developing a table of specifications?


Learner assessment within the framework of classroom instruction requires planning. The
following are the steps in developing a table of specifications:
1. Determine the objectives of the test. The first step is to identify the test objectives. This
should be based on the instructional objectives. In general, the instructional objectives or the
intended learning outcomes are identified at the start, when the teacher creates the course
syllabus. There are three types of objectives: (1) cognitive, (2) affective, and (3) psychomotor.
Cognitive objectives are designed to increase an individual's knowledge, understanding, and
awareness. On the other hand, affective objectives aim to change an individual's attitude into
something desirable, while psychomotor objectives are designed to build physical or motor
skills. When planning for assessment, choose only the objectives that can be best captured by a
written test. There are objectives that are not meant for a written test. For example, if you test the
psychomotor domain, it is better to do a performance-based assessment. There are also cognitive
objectives that are sometimes better assessed through performance-based assessment. Those that
require the demonstration or creation of something tangible like projects would also be more
appropriately measured by performance-based assessment. For a written test, you can consider
cognitive objectives, ranging from remembering to creating of ideas, that could be measured
using common formats for testing, such as multiple choice, alternative response test, matching
type, and even essays or open-ended tests.
2. Determine the coverage of the test. The next step in creating the TOS is determine the
contents of the test. Only topics or contents that have been discussed in class and are relevant
should be included in the test.
3. Calculate the weight for each topic. Once the test coverage is determined, the weight of each
topic covered in the test is determined. The weight assigned per topic in the test is based on the
relevance and the time spent to cover each topic during instruction. The percentage of time for a
topic in a test is determined by dividing the time spent for that topic during instruction by the
total amount of time spent for all topics covered in the test. For example, for a test on the
Theories of Personality for General Psychology 101 class, the teacher spent 1/4 to 1 1/2 hours
class sessions. As such, the weight for each topic is as follows:
Percentage of Time
Topic No. of Sessions Time Spent
(Weight)
Theories and Concepts 0.5 class session 30 min 10.0
Psychoanalytic Theories 1.5 class sessions 90 min 30.0
Trait Theories 1 class session 60 min 20.0
Humanistic Theories 0.5 class session 30 min 10.0
Cognitive Theories 0.5 class session 30 min 10.0
Behavioral Theories 0.5 class session 30 min 10.0
Social Learning Theories 0.5 class session 30 min 10.0
TOTAL 5 class sessions 300 min or 5 100
hours

Assessment in Learning
COURSE

4. Determine the number of items for the whole test. To determine the number of items to be
included in the test, the amount of time needed to answer the items are considered. As a general
rule, students are given 30-60 seconds for each item in test formats with choices. For a one-hour
class, this means that the test should not exceed 60 items. However, because you need also to
give time for test paper/booklet distribution and giving instructions, the number of items should
be less, maybe just 50 items.
5. Determine the number of items per topic. To determine the number of items to be included in
the test, the weights per topic are considered. Thus, using the examples above, for a 60-item final
test, Theories & Concepts, Humanistic Theories, Cognitive Theories, Behavioral Theories, and
Social Learning Theories will have 5 items, Trait Theories - 10 items, and Psychoanalytic
Theories
- 15 items.
Topic Percentage of Time (Weight) No. of Items
Theories and Concepts 10.0 5
Psychoanalytic Theories 30.0 15
Trait Theories 20.0 10
Humanistic Theories 10.0 5
Cognitive Theories 10.0 5
Behavioral Theories 10.0 5
Social Learning Theories 10.0 5
TOTAL 100 50 items

The Different Formats of a Test Table of Specifications


1. One-Way TOS. A one-way TOS maps out the content or topic, test objectives, number of
hours spent, and format, number, and placement of items. This type of TOS is easy to develop
and use because it just works around the objectives without considering the different levels of
cognitive behaviors. However, a one-way TOS cannot ensure that all levels of cognitive
behaviors that should have been developed by the course are covered in the test.
No. of Format and No. and
Topic Test Objective Hours Placement of Percent
Spent Items of Items
Theories and Recognize important
Multiple Choice 5
Concepts concepts in personality 0.5
Item #s 1-5 (10.0) %
theories
Psychoanalytic Identify the different
Multiple Choice 15
Theories theories of personality under 1.5
Item #s 6-20 (30.0 %)
the Psychoanalytic Model
Etc.
50
TOTAL 5 (100%)

Assessment in Learning
COURSE

2. Two-Way TOS. A two-way TOS reflects not only the content, time spent, and number of items
but also the levels of cognitive behavior targeted per test content based on the theory behind
cognitive testing. For example, the common framework for testing at present in the DepEd
Classroom Assessment Policy is the Revised Bloom's Taxonomy (DepEd, 2015). One advantage
of this format is that it allows one to see the levels of cognitive skills and dimensions of
knowledge that are emphasized by the test. It also shows the framework of assessment used in
the development of the test. However, this format is more complex than the one-way format.

No. & Level of Cognitive Behavior, Item Format, No.


Time
Content Percent KD* and Placement of Items
Spent
of Items R U AP AN E C
Theories and 0.5 5 F I.3
Concepts hours (10.0%) #1-3
C I.2
# 4-5
Psychoanalytic 1.5 15 F I.2
Theories hours (30.0 %) #6-7
C I.2 I.2
#8-9 #10-11
P I.2 I.2
#12-13 #14-15
M I.3 II.1 II.1
#16-18 #41 #42
Etc.
Scoring 1 point per item 2 points per item 5 points per item
OVERALL 50
5 (100.0%) 20 20 10
TOTAL
Another presentation is shown below:

Level of Cognitive Behavior and Knowledge


Time No. of
Content Dimension*, Item Format, No. and Placement of Items
Spent Items
R U AP AN E C
Theories and 0.5 5 I.3 I.2
Concepts hours (10.0%) #1-3 # 4-5
(F) (C)
Psychoanalytic 1.5 15 I.2 I.2 I.2 I.2 II.1 II.1
Theories hours (30.0 %) #6-7 #8-9 #10-11 #14-15 #41 #42
(F) (C) (C) (P) (M) (M)
I.2 I.3
#12-13 #16-18
(P) (M)
Etc.
Scoring 1 point per item 2 points per item 3 points per item
OVERALL 50 10
TOTAL (100.0%) 20 20

*Legend: KD = knowledge Dimension (Factual, Conceptual, Procedural, Metacognitive

I – Multiple Choice; II – Open- Assessment in Learning


COURSE

3. Three-Way TOS. This type of TOS reflects the features of one-way and two-way TOS. One
advantage of this format is that it challenges the test writer to classify objectives based on the
theory behind the assessment. It also shows the variability of thinking skills targeted by the test.
However, it takes a much longer to develop this type of TOS.
Level of Cognitive Behavior and
Learning Time No. of Knowledge Dimension*, Item Format,
Content
Objective Spent Items No. and Placement of Items
R U AP AN E C
Theories and Recognize 0.5 5 I.3 I.2
Concepts important hours (10.0%) #1-3 # 4-5
concepts in (F) (C)
personality
theories
Psychoanalytic Identify the 1.5 15 I.2 I.2 I.2 I.2 II.1 II.
Theories different hours (30.0 %) #6-7 #8-9 #10-11 #14- #41 1
theories of (F) (C) (C) 15 (M) #4
personality I.2 (P) 2
under #12-13 I.3 (M
psychoanalytic (P) #16- )
model 18
(M)
Etc.
Scoring 1 point per 2 points per 3 points
item item per item
OVERALL 50
(100.0%) 20 20 10
TOTAL
*Legend: KD = knowledge Dimension (Factual, Conceptual, Procedural, Metacognitive
I – Multiple Choice; II – Open-Ended
COURSE
COURSE

CONSTRUCTION OF WRITTEN TEST

What are the general guidelines in choosing the appropriate test format?
Not every test is universally valid for every type of learning outcome. For example, if an
intended outcome for a Research Method 1 course is “to design and produce a research study
relevant to one’s field of study, “you cannot measure this outcome through a multiple-choice test
or a matching type test.
To guide you in choosing the appropriate test format and designing fair and appropriate
yet challenging tests, you should ask the following important questions:
1. What are the objectives or desired learning outcomes of the subject/unit/lesson being assessed?
Deciding on what test format to use generally depends on your learning objectives or the
desired learning outcomes of the subject/unit/lesson. Desired learning outcomes (DLOs) are
statements of what learners are expected to do or demonstrate as a result of engaging in the
learning process.
2. What level of thinking is to be assessed (i.e., remembering, understanding, applying,
analyzing, evaluating, and creating)? Does the cognitive level of the test question match your
instructional objectives or DLOs?
The level of thinking to be assessed is also an important factor to considering when
designing your test, as this will guide you in choosing the appropriate test format. For example, if
you intend to assess how much your learners are able to identify important concepts discussed in
class (i.e., remembering or understanding level), a selected-response format such as multiple-
choice test would be appropriate. However, if you intend to assess how your student will be able
to explain and apply in another setting a concept or a framework learned in class (i.e., applying
and/or analyzing level), you may consider giving constructed-response test formats such as
essays.
It is important that when constructing classroom assessment tools, all level of cognitive
behaviors are represented – from Remembering (R), Understanding (U), Applying (Ap),
Analyzing (An), Evaluating €, and Creating (C) – and taking into consideration the Knowledge
Dimensions, i.e., Factual (F), Conceptual (C), Procedure (P), and Metacognition (M).
3. Is the test matched and aligned with the course’s DLOs and the course contents for learning
activities?
The assessment tasks should be aligned with the instructional activities and DLOs. Thus,
it is important that you are clear about what DLOs are to be addressed by your test and what
course or activities or tasks are to be implemented to achieve the DLOs.
For example, if you want learners to articulate and justify their stand on ethical decision-
making and social responsibility practice in business (i.e., DLO) then an essay test and class
debate are appropriate measure and task of this learning outcome. A multiple-choice may be used
but only if you intend to assess learners’ ability to recognize what is ethical versus unethical
decision- making practice. In the same manner, matching type items may be appropriate if you
want to know whether your students can differentiate and match the different approaches or
terms to their definitions.
COURSE

4. Are the test items realistic to the students?


Test items should be meaningful and realistic to the learners. They should be relevant or
related to their everyday experiences. The use of concepts, terms, or situations that have not been
discussed in the class or that they have new encountered, read, or heard about should be
minimized or avoided. This is to prevent learners from making wild guesses, which will
undermine your measurement of what they have really learned from the class.

The Two Major Categories and Formats of Traditional Tests


1. Selected-Response Test require learners to choose the correct answer or best alternative from
several choices. While they can cover a wide range of learning materials very efficiently and
measure a variety of learning outcomes, they are limited when assessing learning outcomes that
involve more complex and higher-level thinking skills. Selected-response test include:
 Multiple Choice Test – is the commonly used format in formal testing and typically
consists of a stem (problem), one correct or best alternative (correct answer), and three or
more incorrect or inferior alternatives (distractors).
 True-False or Alternative Response Test – generally consists of a statement and
deciding if the statement is true (accurate/correct) or false (inaccurate/incorrect).
 Matching-Type Test – consists of two sets of items to be matched with each other based
on a specified attribute.

2. Constructed-Response Tests require learners to supply answers to a given question or


problem. These include:
 Short Answer Test – consists of open-ended questions or incomplete sentences that
require learners to create an answer for each item, which is typically a single word or
short phrase. This includes the following types:
• Completion – consists of incomplete statements that require the learners to
fill in the blanks with the correct word or phrase.
• Identification – consists of statements that require the learners to identify
or recall the terms/concepts, people, places, or events that are being
described.
• Enumeration – requires the learners to list down all possible answers to
the question.

 Essay Test – consists of problems/questions that require learners to compose or construct


written responses, usually long ones with several paragraphs.

 Problem-Solving Test – consists of problems/questions that require learners to solve
problems in quantitative or non-quantitative settings using knowledge and skills in
mathematical concepts and procedures, and/or other higher-order cognitive skills (e.g.,
reasoning, analysis, critical thinking, and skills).
COURSE

What are general guidelines in writing multiple-choice test items?


Writing multiple-choice items requires content mastery, writing skills, and time. Only
good and effective items should be included in the test. Poorly written test items could be
confusing and frustrating to learners and yield test scores that are not appropriate to evaluate
their learnings and achievement. The following are the general guidelines in writing a good
multiple-choice item. They are classified in terms of content, stem, and options.
Content:
1. Write items that reflect only, one specific content and cognitive processing skills.
Faulty: Which of the following is a type of statistical procedure used to test a hypothesis
regarding significant relationship between variables, particularly in terms of the extent and
direction of association?
a. ANCOVA b. ANOVA c. Correlation d. t-test
Good: Which of the following is an inferential statistical procedure used to test a hypothesis
regarding significant differences between two qualitative variables?
a. ANCOVA b. ANOVA c. Chi-square d. Mann-Whitney Test
2. Do not lift and use statements from the textbook or other learning materials as test questions.
3. Keep the vocabulary simple and understandable based on the level of learners/examinees.
4. Edit and proofread the item for grammatical and spelling before administering them to the
learners.
Stem:
1. Write the directions in the stem in a clear and understandable manner.
Faulty: Read each question and indicate your answer by shading the circle corresponding to your
answer.
Good: This test consists of two parts. Part A is a reading comprehension test, and Part B is
grammar/language test. Each question is a multiple-choice item with five (5) options. You are to
answer each question but will not penalized for a wrong answer or for guessing. You can go back
and review your answers during time allotted.
2. Write stems that are consistent in form and structure, that is, present all items either in
question form or in descriptive or declarative form.
Faulty: (1) Who was the president of the Philippines during Martial Law?
(2) The first president of the Commonwealth of the Philippines was .

Good: (1) Who was the Philippine president during Martial Law?
(2) Who was the first president of the Commonwealth of the Philippines?
COURSE

3. Word the stem positively and avoid double negatives, such as NOT and Except in the stem. If
the negative word is necessary, underline or capitalized the word for emphasis.

Faulty: Which of the following is not a measure of variability?


Good: Which of the following is NOT a measure of variability?

4. Refrain from making the stem too wordy or containing too much information unless the
problem/question requires the facts presented to solve the problem.

Faulty: What does DNA stand for, and what is the organic chemical of complex molecular
structure found in all cells and viruses and codes genetic information for the transmission of
inherited traits?
Good: As a chemical compound, what does DNA stand for?

5. Do not use unfamiliar words, terms, and phrases. The ability of the item to discriminate or its
level of difficulty should stem from the subject matter rather than from the wording of the
question.
Example: What would be the system reliability of a computer system who slave and peripherals
are connected in parallel circuits and each one has a known time to failure probability of 0.05?
A student completely unfamiliar with the terms “slave” and “peripherals” may not be able
to answer correctly even if he knew the subject matter of reliability.
6. Do not use modifiers that are vague and whose meanings can differ from one person to the
next such as much, often, usually etc.
Example: Much of the process of photosynthesis takes place in the:
a. bark b. leaf c. stem
The qualifier “much” is vague and could have been replaced by more specific qualifiers
like: “90% of the photosynthetic process” or some similar phrase that would be more precise.
7. Avoid complex or awkward word arrangements. Also, avoid use of negatives in the stem as
this may add unnecessary comprehension difficulties.
Faulty: As President of the Republic of the Philippines, Corazon Cojuangco Aquino would stand
next to which President of the Philippine Republic after the 1986 EDSA Revolution?
Good: Who was the President of the Philippines after Corazon C. Aquino?
8. Do not use negative or double negative as such statements tend to be confusing. It is best to
use simpler sentences rather than sentences that would require expertise in grammatical
construction.
Faulty: (1) Which of the following will not cause inflation in the Philippine economy?
(2) What does the statement “Development patterns acquired during the formative years
are NOT unchangeable” imply?
COURSE

Good: (1) Which of the following will cause inflation in the Philippine economy?
(2) What does the statement “Development patterns acquired during the formative years
are changeable” imply?

9. Each item stem should be a short as possible; otherwise, you risk testing more for reading and
comprehension skills.

Option:

1. Provide three (3) to five (5) options per item, with only one being the correct or best
answer/alternative.

2. Write options that are parallel or similar in form and length to avoid giving clues about the
correct answer.

Faulty: What is an ecosystem?

a. It is a community of living organisms in conjunction with the nonliving components of


their environment that interact as a system. These biotic and abiotic components are
linked together through nutrient cycles and energy flows.

b. It is a place on Earth’s surface where life dwells.


c. It is an area that one or more individual organisms defend against competition from
other organisms.
d. It is the biotic and abiotic surroundings of an organism or population.
e. It is the largest division of the Earth’s surface filled with living organisms.
Good: What is an ecosystem?

a. It is a place on Earth’s surface where life dwells.


b. It is the biotic and abiotic surroundings of an organism or population.
c. It is the largest division of the Earth’s surface filled with living organisms.

d. It is a large community of living and non-living organisms in a particular area.

e. It is an area that one or more individual organisms defend against competition from
other organisms.

3. Place correct response randomly to avoid a discernable pattern of correct answers.

4. make all options realistic and reasonable.


COURSE

5. Place options in a logical order (e.g., alphabetical, from shortest to longest).

Example: Which experimental gas law describes how the pressure of gas tends to increase as the
volume of the container decreases? (i.e., “The absolute pressure exerted by a given mass of an
ideal gas is inversely proportional to the volume it occupies.”)

Faulty: a. Boyle’s Law b. Charles Law c. Beer Lamber Law


d. Avogadro’s Law e. Faraday’s Law

Good: a. Avogadro’s Law b. Beer Lamber Law c. Boyle’s Law


d. Charles Law e. Faraday’s Law
6. Use None-of-the-above carefully and only when there is one absolutely correct answer, such as
spelling or math items.

Example: Which of the following is a nonparametric statistic?

Faulty: a. ANCOVA b. ANOVA c. Correlation


d. t-test e. None of the Above

Good: a. ANCOVA b. ANOVA c. Correlation


d. Mann-Whitney Test e. t-test

7. Avoid All of the Above as an option, especially if it is intended to be correct answer.

Faulty: Who among the following has become the President of Philippine Senate?

a. Ferdinand Marcos b. Manuel Quezon c. Manuel Roxas


d. Quintin Paredes e. All of the Above

Good: Who was the first ever President of the Philippine Senate?

a. Eulogio Rodriguez b. Ferdinand Marcos c. Manuel Quezon


d. Manuel Roxas e. Quintin Paredes

8.Distracters should be equally plausible and attractive.

Example: The short story: May Day’s Eve, was written by which Filipino author?

a. Jose Garcia Villa b. Nick Joaquin c. Genoveva Edrosa Matute


d. Robert Frost e. Edgar Allan Poe

If distracters had all been Filipino authors, the value of the item would be greatly
increased. In this particular instance, only the first three carry the burden of the entire item since
the last two can be essentially disregarded by the student.
COURSE

What are the general guidelines in writing matching type items?

The matching test item format requires learners to match a word, sentence, or phrase in
one column (i.e., premise) to corresponding word, sentence, or phrase in a second column (i.e.,
response). It is the most appropriate when you need to measure the learners’ ability to identify
the relationship or association between similar items. They work best when the course content
has many parallel concepts. While matching-type test format is generally used for simple recall
of information, you can find ways to make it applicable or useful in assessing higher level of
thinking such as applying and analyzing.

The following are the general guidelines in writing good and effective matching-type tests:

1. Clearly state in the directions the basis for matching the stimuli with the responses.
Faulty: Directions: Match the following.
Good: Directions: Column I is a list of countries while Column II presents the continents where
these countries are located. Write the letter of the continent corresponding to the country on the
line provided in Column I.

Item #1’s instruction is less preferred s it does not detail the basis for matching the stem
and the response options.

2. Ensure that the stimuli are longer, and the responses are shorter.

Example: Match the description of the flag to its country.

Faulty:
A B
Bangladesh a. Green background with red circle in the center
Indonesia b. One red strip on top and white strip at the bottom
Japan c. Red background with white five-petal flower in the center
Singapore d. Red background with large yellow circle in the center
Thailand e. Red background with large yellow pointed star in the center
f. White background with large red circle in the circle

Good:
A B
Green background with red circle in the center a. Bangladesh
One red strip on top and white strip at the bottom b. Indonesia
Red background with white five-petal flower in the center c. Japan
Red background with large yellow circle in the center d. Singapore
Red background with large yellow pointed star in the center e. Thailand
f. Vietnam

Item #2 is aa better version because the descriptions are presented in the first column
while the response options are in the second column. The stems are also longer than the options.
COURSE

3. For each item, include only topics that are related with one another and share the same
foundation of information.

Faulty: Match the following:


A B
1. Indonesia a. Asia
2. Malaysia b. Bangkok
3. Philippines c. Jakarta
4. Thailand d. Kuala Lumpur
5. Year ASEAN was established e. Manila
f. 1967

Good: On the line to the left of each country in Column I, write the letter of the country’s capital
presented in Column II.

Column I Column II
1. Indonesia a. Bandar Seri Begawan
2. Malaysia b. Bangkok
3. Philippines c. Jakarta
4. Thailand d. Kuala Lumpur
e. Manila

Item #1 is considered an unacceptable item because its response option are not parallel
and include different kinds of information that can provide clues to the correct/wrong answers.
On the other hand, item #2 details the basis for matching and the response options only include
related concepts.

4. Make the response option short, homogeneous, and arranged in logical order.

Faulty: Match the chemical elements with their characteristics.


A B
Gold a. Au
Hydrogen b. Magnetic metal used in steel
Iron c. Hg
Potassium d. K
Sodium e. With lowest density
f. Na
Faulty: Match the chemical elements with their symbols.
A B
Gold a. Au
Hydrogen b. Fe
Iron c. H
Potassium d. Hg
Sodium e. K
f. Na
In item #1, response options are not parallel in contents and length. They are not also
arranged alphabetically.
COURSE

5. Include response options that are reasonable and realistic and similar in length and
grammatical form.

Faulty: Match the subject with their courses description.

A B
History a. Studies the production and distribution of goods/services
Political Science b. Study of politics and power
Psychology c. Study of Society
Sociology d. Understand role of mental functions in social behavior
e. Uses narrative to examine and analyze past events

Good: Match the subjects with their course description.

A B
1. Study of living things a. Biology
2. Study of mind and behavior b. History
3. Study of politics and power c. Political Science
4. Study of recorded events in the past d. Psychology
5. Study of society e. Sociology
f. Zoology

Item #1 is less preferred because the response options are not consistent in terms of their
length and grammatical form.

6. Provides more response options than the number of stimuli.

Example: Match the following fractions with their corresponding decimal equivalents:

Faulty:
A B
1/4 a. 0.25
5/4 b. 0.28
7/25 c. 0.90
9/10 d. 1.25

Good:
A B
1/4 a. 0.25
5/4 b. 0.28
7/25 c. 0.90
9/10 d. 1.25
e. 0.09

Item #1 is considered inferior to Item #2 because it includes the same number of


response options as that of the stimuli, thus making it more prone to guessing.
COURSE

What are the general guidelines in writing true or false items?

True or false items are used to measure learners’ ability to identify whether a statement or
proposition is correct/true or incorrect/false. They are best used when learners’ ability to judge or
evaluate is one of the desired learning outcomes of the course.

There are different variations of the true or false items. These include the following:

1. T_F Correction or Modified True-or-False Question. In this format, the statement is


presented with a key word or phrase that is underlined, and the learner has to supply the correct
word or phrase.

Ex. Multiple-Choice Test is authentic.

2. Yes-No Variation. In this format, the learner has to choose yes or no, rather than true or false.

Ex. The following are kinds of test. Circle Yes if it is authentic test and No if not.

Multiple Choice Test Yes No


Debates Yes No
End-of-the Term Project Yes No
True or False Test Yes No

3. A-B Variation. In this format, the learner has to choose A or B, rather than true or false.

Ex. Indicate which of the following are traditional or authentic tests by circling A if it is a
traditional test and B if it is authentic.
Traditional Authentic
Multiple Choice Test A B
Debates A B
End-of-the Term Project A B
True or False Test A B

Because true or false test items are prone to guessing, as learners are asked to choose
between two options, utmost care should be exercised in writing true or false items. The
following are the general guidelines in writing true or false items:

1. Include statements that are completely true or completely false.

Faulty: The presidential system of government, where the president is only the head of state or
government, is adopted by the United States, Chile, Panama, and South Korea.

Good: The presidential system, where the president is only the head of state or government, is
adopted by Chile.
COURSE

Item #1 is of poor quality because, while the description is right, the countries given are
not all correct. While South Korea has a presidential system or government, it also has a prime
minister who governs alongside with the president.

2. Use simply and easy-to-understand statements.

Faulty: Education is a continuous process of higher adjustment for human beings who have
evolved physically and mentally, which is free and conscious of God, as manifested in nature
around the intellectual, emotional, and humanity of man.

Good: Education is the process of facilitating learning or the acquisition of knowledge, skills,
values, beliefs, and habits.

Item #1 is somewhat confusing, especially for younger learners because there are many
ideas in one statement.

3. Refrain from using negatives - especially double negatives.

Faulty: There is nothing illegal about buying goods through internet.

Good: It is legal to buy things or goods through the internet.

Double negatives are sometimes confusing and could result in wrong answers, not
because the learner does not know the answer but because of how the test items are presented.

4. Avoid using absolutes such as “always” and “never”.

Faulty: The news and information posted on the CNN website is always accurate.

Good: The news and information posted on the CNN website is usually accurate.

Absolute words such as “always” and “never” restrict possibilities and make a statement
as true 100% or all the time. They are also a hint for a “false” answer.

5. Express a single idea in each test item.

Faulty: If an object is accelerating, a net force must be acting on it, and the acceleration of an
object is directly proportional to the net force applied to the object.

Good: If an object is accelerating, a net force must be acting on it.

Item #1 consists of two conflicting ideas, wherein one is not correct.


COURSE

6. Avoid the use of unfamiliar words or vocabulary.

Faulty: Esprit de corps among soldiers is important in the face of hardships and opposition in
fighting the terrorists.

Good: Military morale is important in the face of hardships and opposition in fighting the terrorists.

Students may have a difficult time understanding the statement, especially if the word
“esprit de corps” has not been discussed in the class. Using unfamiliar words would likely lead to
guessing.

7. Avoid lifting statements from the textbook and other learning materials.

What are the general guidelines in writing short-answer test items?

A short-answer test item requires the learner to answer a question or to finish an


incomplete statement by filling in the blank with the correct word or phrase. While it is most
appropriate when you only intend to assess learners; lower-level thinking, such as their ability to
recall facts learned in class, you can create items that minimize guessing and relevant clues to the
correct answer.

The following are the general guidelines in writing good fill-in-the-blank or completion
test items:

1. Omit only significant words from the statements.

Faulty: Every atom has a central called a nucleus.

Good: Every atom has a central core called a(n) .

In item #1, the word “core” is not the significant word. The item is also prone to many
and varied interpretations, resulting to many possible answers.

2. Do not omit too many words from the statement such that the intended meaning is lost.

Faulty: is to Spain as the is to United States and as is to Germany.

Good: Madrid is to Spain as the is to France.

Item #1 is prone to many and varied answers. For example, a student may answer the
question based on the capital of these countries or based on what continent they are located. Item
#2 is preferred because it is more specific and requires only one correct answer.

3. Avoid obvious clues to the correct response.

Faulty: Ferdinand Marcos declared martial law in 1972. Who was the president during that period?
COURSE

Good: The president during the martial law years was .

Item #1 already gives a clue that Ferdinand Marcos was the president during this time
because only the president of a country can declares martial law.

4. Be sure that there is only one correct response.

Faulty: The government should start using renewable energy sources for generating electricity
such as .

Good: The government should start using renewable sources of energy by using turbines called _.

Item #1 has many possible answers because the statement is very general (e.g., wind,
solar, biomass, geothermal, and hydroelectric). Item #2 is more specific and only requires one
correct answer (i.e., wind).

5.Avoid grammatical clues to the correct response.

Faulty: A subatomic particle with a negative electric charge is called an .

Good: A subatomic particle with a negative electric charge is called a(n) .

The word “an” in item #1 provides a clue that the correct answer starts with a vowel.

6. If possible, put the blank at the end of the statement rather than at the beginning.

Faulty: is the basic building block of matter.


Good: The basic building block of matter is .

In item #1, learners may need to read the sentence until the end before they can recognize
the problem, and then re-read it again and then answer the question. On the other hand, in item
#2, learners can already identify the context of the problem by reading through the sentence only
once and without having to go back and re-read the sentence.

What are general guidelines in writing essay tests?


Teachers generally choose and employ essay tests over other forms of assessment
because essay tests require learners to create a response rather than to simply select a response
from among alternatives. They are the preferred form of assessment when teachers want to
measure learners higher-order thinking skills, particularly their ability to reason, analyze,
synthesize, and evaluate. They also assess learners' writing abilities. They are most appropriate
for assessing learners'
(1) understanding of subject matter content, (2) ability to reason with their knowledge of the
subject, and (3) problem-solving and decision skills because items or situations presented in the
test are authentic or close to real life experiences.
COURSE

There are two types of essay test: (1) extended-response essay and (2) restricted-response essay.

Extended-Response Restricted-Response
Requires much longer and complex responses
Is much more focused and restrained

How are the leopard and tiger differ? Tina is preparing for a demonstration to display at her school's science
Support your answer with details and information from
fair. She thetoarticle.
needs show the effect of salt on the bouyancy of egg.

Part A: Identify at least two other actions that would make Tina's demonstration better.
Part B: Explain why each action would improve the demonstration.

The following are the general guidelines in constructing good essay questions:

1. Clearly define the intended learning outcome to be assessed by the essay test.

To design effective essay questions or prompts, the specific intended learning outcomes
are identified. If the intended learning outcomes to be assessed lack clarity and specificity, the
questions or prompts may assess something other than what they intend to assess. Appropriate
direct verbs that most closely match the ability that learners should demonstrate must be used in
the prompts. These include verbs such as compose, analyze, interpret, explain, and justify,
among others.

2. Refrain from using essay test for intended learning outcomes that are better assessed by other
kinds of assessment.

Some intended learning outcomes can be efficiently and reliably assessed by selected-
type test rather than by essay test. In the same manner, there are intended learning outcomes that
are better assessed using other authentic assessments, such as performance test, rather than by
essay test. Thus, it is important to take into consideration the limitations of essay tests when
planning and deciding what assessment method to employ for an intended learning outcome.

3. Clearly define and situate the task within a problem situation as well as the type of thinking
required to answer the test.

Essay questions or prompts should provide clear and well-defined tasks to the learners. It
is important to carefully choose the directive verb, to write clearly the object or focus of the
directive verb, and to delimit the scope of the task. Having clear and well-defined tasks will
guide learners on what to focus on when answering the prompts, thus avoiding responses that
contain ideas that are unrelated or irrelevant, too long, or focusing only on some part of the task.
Emphasizing the type of thinking required to answer the question will also guide students on the
extent to which they should be creative, deep. complex, and analytical in addressing and
responding to the questions.
COURSE

4. Present tasks that are fair, reasonable, and realistic to the students.

Essay questions should contain tasks or questions that students will be able to do or
address. These include those that are within the level of instruction/ expertise, and experience of
the students. training, expertise, and experience of the student.

5. Be specific in the prompts about the time allotment and criteria for grading the response.

Essay prompts and directions should indicate the approximate time given to the students
to answer the essay questions to guide them on how much time they should allocate for each
item, especially if several essay questions are presented. How the responses are to be graded or
rated should also be clarified to guide the students on what to include in their responses.

What are the general guidelines in problem-solving test items?

Problem-solving test items are used to measure learners' ability to solve problems that
require quantitative knowledge and competencies and/or critical thinking skills. These items
present a problem situation or task that will require learners to demonstrate work procedures or
come up with a correct solution. Full or partial credit can be assigned to the answers, depending
to the answers, solutions required.

There are different variations of the quantitative problem-solving item These include the
following:

1. One answer choice - This type of question contains four or five options, and students are
required to choose the best answer.

Example: What is the mean of the following score distribution: 32, 44, 56, 60 75, 77, 95, 967?

A. 68 B. 69 C. 72 D. 74 E. 76

The correct answer is A (68).

2. All possible answer choices - This type of question has four or five options, and students are
required to choose all of the options that are correct.

Example: Consider the following score distribution: 12, 14, 14, 14, 17, 24, 27, 28, 30. Which of
the following is/are the correct measure/s of central tendency? Indicate all possible
answers.

A. Mean = 20 B. Mean = 22 C. Median = 16


D. Median = 17 E. Mode = 14

Options A, D, and E are all correct answers.


COURSE

3. Type-In answer - This type of question does not provide options to choose from. Instead, the
learners are asked to supply the correct answer. The teacher should inform the learners at the
start how their answers will be rated. For example, the teacher may require just the correct
answer or ma require learners to present the step-by-step procedures in coming up their answers.
On the other hand, for non-mathematical problem solving, such as a case study, the teacher may
present a rubric how their answers will be rated.

Example: Compute the mean of the following score distribution: 32, 44, 56, 69, 75, 77, 95, 96.
Indicate your answer in the blank provided.

In this case, the learners will only need to give the correct answer without having to show
the procedures for computation.

Example: Lillian, a 55-year-old accountant, has been suffering from frequent dizziness, nausea,
and lightheadedness. During the interview, Lillian was obviously restless, and sweating.
She reported feeling so stressed and fearful of anything without any apparent reason. She
could not sleep and eat well. She also started to withdraw from family and friends, as she
experienced frequent panic attacks. She also said that she was constantly worrying about
everything in work and at home. What might be Lillian's problem? What should she do to
alleviate all her symptoms?

Problem-solving test items are good test format as they minimize guessing, measure
instructional objectives that focus on higher cognitive levels, and measure extensive amount of
contents or topics. However, they require more time for teachers to construct, read, and correct,
and are prone to rater bias, especially when scoring rubrics/criteria are not available. It is
therefore important that good quality problem-solving test items are constructed.

The following are some of the general guidelines in constructing good problem-solving test items:

1. Identify and explain the problem clearly.

Faulty: Tricia was 135.6 lbs. when she started with her zumba/aerobics exercises. After three
months of attending the sessions three times a week, her weight was down to 122.8 lbs.
About how many lbs. did she lose after three months? Write you final answer in the space
provided and show your computations. [This question asks, "about how many" and does
not indicate whether learners need to give the exact weight or whether they need to round
off their answer and to what extent.]

Good: Tricia was 135.6 lbs. when started with her zumba/aerobics exercises. After three months
of attending the sessions three times a week, her weight was down to 122.8 lbs. How
many lbs. did she lose after three months? Write you final answer in the space provided
and show your computations. Write the exact weight; do not round off.
COURSE

2. Be specific and clear of the type of response required from the students.

Faulty: ASEANA Bottlers, Inc. has been producing and selling Tutti Fruity juice in Philippines,
aside from their Singapore market. The sales for the juice in the Singapore market were
S$5million more than those of their Philippine market in 2016, S$3million more in 2017,
and S$4.5million in 2018. If the sales in Philippine market in 2018 was PHP35million,
what were the sales in Singapore market during that year? [This is a faulty question
because it does not specify in what currency should the answer be presented.]

Good: ASEANA Bottlers, Inc. has been producing and selling Tutti Fruity juice in Philippines,
aside from their Singapore market. The sales for the juice in the Singapore market were
S$5million more than those of their Philippine market in 2016, S$3million more in 2017,
and S$4.5million in 2018. If the sales in Mexican market in 2018 was PHP35 million,
what were the sales in U.S. market during that year? Provide answer in Singapore dollars
(15$
= PHP36.50). [This is a better item because it specifies in what currency should the
answer be presented, and the exchange rate was given.]

3. Specify in the directions the bases for grading students' answers/procedures.

Faulty: VCV Consultancy Firm was commissioned to conduct a survey on the voters'
preferences in Visayas and Mindanao for the upcoming presidential election. In Visayas,
65% are for Liberal Party (LP) candidate, while 35% are for the Nationalist Party (NP)
candidate. In Mindanao, 70% of the voters are Nationalists, while 30% are LP supporters.
A survey was conducted among 200 voters for each region. What is the probability that
the survey will show a greater percentage of Liberal Party supporters in Mindanao than in
the Visayas region? [This question is undesirable because it does not specify the basis for
grading the answer.]

Good: VCV Consultancy Firm was commissioned to conduct a survey on voters' preferences in
Visayas and Mindanao for the upcoming presidential election. In Visayas, 65% are for
Liberal Party (LP) candidate, while 35% are for the Nationalist Party (NP) candidate. In
Mindanao, 70% of the voters are Nationalists while 30% are LP supporters. A survey was
conducted among 200 voters for each region.

What is the probability that the survey will show a greater percentage of Liberal Party
supporters in Mindanao than in the Visayas region? Please show your solutions to support your
answer. Your answer will be graded as follows:

0 point = for wrong answer and wrong solution


1 point = for correct answer only (i.e., without or wrong
solution) 3 points = for correct answer with partial solutions
5 points = for correct answer with complete solutions
COURSE

You might also like