You are on page 1of 16

1

MODULE 5 Development of Assessment Tools


OVERVIEW An assessment is a diagnostic process that measures an individual’s behaviors,
motivators, attitudes and competencies. Assessment tools comprise various instruments
and procedures. These tools are largely used in educational institutions, nonprofit
organizations and in corporate settings. The success of designing and developing
assessment tools is brought about by using scientific methods.
In this module, we will discuss the development of classroom assessment tools
particularly on the table of specifications and test constructions. The table of specifications
helps you a would-be teacher in preparing your examination. This is your blueprint in test
construction. This is also a useful guide in constructing a test and in determining the type
of test items that you need to construct.
LEARNING At the end of the lesson you are expected to:
OBJECTIVES
 Identify the different types of objective tests;
 Discuss and present different types of tests;
 Develop skills in item analysis and validation; and
 Discuss effectively about item analysis and validation.

Answer the following questions:


1. List down all types of tests you encountered.

2. Among the types of tests, which do you think is the easiest one? Explain
your answer.
What do you
already know?

3. Have you ever asked yourself how your teacher constructed your
examination?
2

Let’s get to Development of Assessment Tools


In developing assessment instruments, the candidates to be assessed should always
know more be kept in mind at each step of the process. Different scenarios to be assessed call for
different tools and modes of evaluation. Ensure that the instruments and procedures for
assessing are relevant to the audience, the skills and the task for which they are being
evaluated.

Knowledge and Reasoning


Answer set programming is an approach to knowledge representation and reasoning.
Knowledge is represented as answer set programs, and reasoning is performed by
answer set solvers. Answer set programming enables default reasoning, which is required
in commonsense reasoning.

Planning a Test and Construction of Table of Specification


In very simple terms, test planning refers to planning the activities that must be
performed during testing in order to achieve the objectives of the test. Test planning for
every test level of the project begins at the starting of that level’s testing process and goes
on up to the end of closing activities of the level.

Preparing a Table of Specification


Table of Specification is a chart or table that details the content and level of
cognitive level assessed on a test as well as the types and emphases of test items.
Consider the following steps in making a table of specification.
1. Select the learning outcomes to be measured. Identify the necessary
instructional objectives needed to answer the test items correctly.
2. Make an outline of the subject matter to be covered in the test.
3. Decide on the number of items per topic.
4. Make your table of specifications.
5. Construct the test items.

A table of specification is composed of the contents, number of class sessions,


percentage, number of items, and Krathwohl’s cognitive level.
Content No. % Number Krathwohl’s cognitive level
Session of Item R U Ap An E C

A table of specification is composed of the contents, number of class


sessions, percentage, number of items, and Bloom’s cognitive level.
Content No. % Number Bloom’s cognitive level
Session of Item K C Ap An S E
3

The table of specification provides the test constructor a way to ensure that the
assessment is based on the intended learning outcomes. As a would-be teacher, you
must have your lesson plan for easy access to the necessary information needed in the
preparation of the table of specification. Your lesson will make your test construction
easier for it is your guide in the whole planning session.
There are things to consider in preparing your examination such as the
appropriateness and the qualities of assessment tools. This will also ensure the
congruency of the learning objectives to the activities you are going to give to your
students. The qualities of assessment tools serve as your guide in ensuring the validity,
reliability, fairness, objectivity, scorability, adequacy, administrability, practicality, and
efficiency of your instrument as well as the result of the test.

Construct a Table of Specification guided by the following:


1. Use TOS Format
2. Select subject area based on your specialization
3. Number of Items is 50
4. Number of Class sessions is 30
Content No. % Number Krathwohl’s cognitive level
Session of Item R U Ap An E C

1. Use TOS Format


2. Select subject area based on your specialization
3. Number of Items is 50
4. Number of Class sessions is 30
Content No. % Number Bloom’s cognitive level
Session of Item K C Ap An S E

General Guidelines for Constructing Test Items


Kubiszyn and Borich (2007) suggested general guidelines for writing test items to
help classroom teachers improve the quality of test items to write:
1. Begin writing items far enough or in advance so that you will have time to revise
them.
2. Match items to intended outcomes at an appropriate level of difficulty to provide
a valid measure of instructional objectives. Limit the questions to the skill being
assessed.
3. Be sure each item deals with an important aspect of the content area and not
with trivia.
4. Be sure the problem posed is clear and unambiguous.
4

5. Be sure that the item is independent of all other items. The answer to one item
should not be required as a condition in answering the next item. A hint to one
answer should not be embedded in another item.
6. Be sure the item has one of the best answers on which experts would agree.
7. Prevent unintended clues to an answer in the statement or questions.
Grammatical inconsistencies such as a or a give clues to the correct answer to
those students who are not well prepared for the test.
8. Avoid replication of the textbook in writing test items; do not quote directly from
the textual materials. You are usually not interested in how well students memorize
the text.
9. Avoid trick or catch questions in an achievement test. Do not waste time testing
how well the students can interpret their intentions.
10. Try to write items that require higher-order thinking skills.

Determining the Number of Test Items


Determining the number of test items depend on the assessment format you
will be using in the examination. Aside from the difficulty of an item, the number of
hours is important on how many items will be included in the examination. Here
are the suggestions the will guide you on the number of test items considering the
format ate average time the student answer the examination.
Assessment Format Average Time to Answer
True-False 30 seconds
Multiple choice 60 seconds
Multiple Choice of HOTS 90 seconds
Short Answer 120 seconds
Completion Type 60 seconds
Matching Type 30 seconds per item
Short Essay 10-15 minutes
Extended Essay 30 minutes
Visual Image seconds

Types of Objective Tests


 Constructing True-False Tests
 Multiple Choice Tests
 Matching Type and Supply Type Items
 Essays

Constructing True-False Tests


In this type of test, the examinees determine whether the statement presented is
true or false. The true or false test item is an example of a “forced-choice test” because
there are only two possible choices in these types of tests. The students are required to
choose the answer true or false in recognition of a correct statement or incorrect
statement.
True or false type of test is appropriate in assessing the behavioral objectives such
as “identify,” “select,” or “recognize.” It is also suited to assess the knowledge and
5

comprehension level in the cognitive domain. This type of test is appropriate when there
are only two plausible alternatives or distracters.

Examples of True or False Type of Test


Directions: Write your answer before the number in each item. Write T if the statement
is true and F if the statement is false.

T F 1. The Cry of the Rebellion happened in present-day Quezon City.


T F 2. The Cavity Mutiny is an event that led to the execution of the GOMBURZA.
T F 3. There is only one account of the martyrdom of the GOMBURZA is questioned by
historians.

Advantages of a True or False Test


1. It covers a lot of content in a short period.
2. It is easier to prepare compared to the multiple-choice and matching type of test.
3. It is easier to score because it can be scored objectively compared to a test that
depends on the judgment of the rater(s).
4. IT is useful when there are two alternatives only.
5. The score is more reliable than the essay test.
Disadvantages of a True or False Test.
1. Limited only to the low level of thinking skills such as knowledge and comprehension,
or recognition or recall information.
2. High probability of guessing the correct answer (50%) compared to multiple-choice
which consist of four options (25%)

Multiple Choice Tests


A multiple-choice test is used to measure knowledge outcomes and other types of
learning outcomes such as comprehension and applications. It is the most commonly
used format in measuring student achievements at different levels of learning.
Multiple-choice item consists of three parts: the stem, the keyed option, and the
incorrect options or alternatives. The stem represents the problem or question usually
expressed in the completion form or question form. The key option is the correct answer.
The incorrect options or alternatives are also called distracters of foil.

Example and Parts of a Multiple-Choice Item


3. What is chiefly responsible for the increase in the average length of life in the USA
during the last fifty years?
Distractor A. Compulsory health and physical education courses in public schools.
Answer *B. The reduced death rate among infants and young children
Distractor C. The safety movement, which has greatly reduced the number of deaths
from accidents
Distractor D. The substitution of machines for human labor.
To make your good exams better, and to make your better exams the best, try to avoid
these exam writing mistakes.

Avoid vague stems by stating the problem in the stem:


Poor Example
6

California:
A. Contains the tallest mountain in the United States.
B. Has an eagle on its state flag.
C Is the second largest state in terms of area.
*D Was the location of the Gold Rush of 1849.

Good Example
What is the main reason so many people moved to California in 1849?
A. California's land was fertile, plentiful, and inexpensive.
*B. Gold was discovered in central California.
C. The east was preparing for a civil war.
D. They wanted to establish religious settlements.

Avoid wordy stems by removing irrelevant data:


Poor Example
Suppose you are a mathematics professor who wants to determine whether or not your
teaching of a unit on probability has had a significant effect on your students. You
decide to analyze their scores from a test they took before the instruction and their
scores from another exam taken after the instruction. Which of the following t-tests is
appropriate to use in this situation?
*A. Dependent samples.
B. Heterogenous samples.
C. Homogenous samples.
D. Independent samples.

Good Example
When analyzing your students’ pretest and posttest scores to determine if your
teaching has had a significant effect, an appropriate statistic to use is the test
for:
*A. Dependent samples.
B. Heterogeneous samples.
C. Homogenous samples.
D. Independent samples.

Avoid negatively worded stems by stating the stem in a positive form:


Poor Example
A nurse is assessing a client who has pneumonia. Which of these assessment findings
indicates that the client does NOT need to be suctioned?
A. Diminished breath sounds.
*B. Absence of adventitious breath sounds.
C. Inability to cough up sputum.
D. Wheezing following bronchodilator therapy.

Good Example
Which of these assessment findings, if identified in a client who has pneumonia,
indicates that the client needs to be suctioned?
A. Absence of adventitious breath sounds.
7

B. Respiratory rate of 18 breaths per minute.


*C. Inability to cough up sputum.
D. Wheezing before bronchodilator therapy.
Note: Test Writing experts believe that negatively worded stems confuse students.

Matching Type Test


The matching type item consists of two columns. Column A contains the descriptions
and must be placed on the left side while Column B contains the options and is placed on
the right side. The examinees are asked to match the options that are associated with the
description(s).

Example of a Matching Type Test Item


Direction: Write the letter of the phrase Column B that the best describe the terms
Column A. Write only the correct answer on the right side before the number.
Column A Column B
_____1. Microeconomics a. Shift to the right
_____2. Macroeconomics b. demanded change in price
_____3. Theories c. Economy as whole
_____4. Variables d. Shift to the left
_____5. Unemployment e. Percentage change in quantity
supplied
_____6. Increase in income f. Hypothesis
_____7. Decrease in population g. responsiveness of demand/supply
_____8. Decrease in the number of seller h. Shift to the right
_____9. Elasticity i. Price theory
_____10. Price elasticity of demand j. Problem leads to existence of idle
resources
_____11. Price elasticity of supply k. subject to change or variations
_____12. Abraham Maslow l. Commodities being purchase
_____13. Marginal Utility m. Amount of satisfaction
_____14. Total Utility n. Hierarchy of needs
_____15. Budget Line o. Dissatisfaction from last unit
consumption

Advantages of Matching Type Test


1. It is simpler to construct than a multiple-choice type of test.
2. It reduces the effect of guessing compared to the multiple-choice and true or false type
of tests.
3. It is appropriate to assess the association between facts.
4. Provides easy, accurate, efficient, objective, and reliable test scores.
5. More content can be covered in the given set of tests.

Disadvantages of Matching Type Test


1. It measures only simple recall or memorization of information.
2. It is difficult to construct due to problems in selecting the descriptions and options.
3. It assesses the only low level of the cognitive domain such as knowledge and
comprehension.
8

Supply Type of Test Items


Supply type items require students to create and supply their answers or perform a
certain task to show mastery of knowledge or skills. It is also known as a constructed
response test. Supply type items or constructed-response tests are classified as:
a. Short answer or completion type
b. Essay type items
Another way of assessing the performance of the students is by using the
Performance-based assessment and portfolio assessment which are categorized under
a constructed-response test.
Subjective test item requires the student to organize and present an original answer
(essay test) and perform tasks to show mastery of learning (performance-based
assessment and portfolio assessment) or supply a word or phrase to answer a certain
question (completion or short answer type of test).
Kinds of Subjective Type Test Items
The subjective type of test is another test format where the student supplies answers
rather than select the correct answer. It covers the completion type item or short answer
test and essay type item.

Completion Type or Short Answer Test


Completion or short answer type is an alternative form of assessment because the
examinee needs to supply or create the appropriate word(s), symbols(s), or number(s) to
answer the question or complete a statement rather than selecting the answer from the
given options. There are two ways of constructing completion type or short answer type
of test: question form or complete the statement form.

Guidelines in Construction Completion Type or Short Answer Test


1. The item should require a single word answer or a brief and definite statement. Do not
use an indefinite statement that allows several answers.
2. Be sure that the language used in the statement is precise and accurate with the subject
being tested.
3. Be sure to omit only keywords; do not eliminate so many words so that the meaning
of the item statement will not change
4. Do not leave the blank at the beginning or within the statement. It should be at the end
of the statement.
5. Use direct questions rather than an incomplete statement. The statement should pose
the problem to the examinee.
6. Be sure to indicate the units in which to be expressed when the statement requires a
numerical answer.
7. Be sure that the answer of the student is required to produce is factually correct
8. Avoid grammatical clues.
9. Do not select textbook sentences.

Examples of Completion and Short Answer


Directions: Write your answer before the number in each item. Write the word(s), phrase,
or symbol(s) to complete the statement.
Question Form Completion Form
9

Question Form Completion Form


__________ 1. Which supply type item is used to measure the ability to organize
and integrate material?
1. Supply type item used to
measure the ability to organize
and integrate material is called
_________________.
__________ 2. What are the incorrect options in
a multiple-choice item called?
2. The incorrect options in a multiple-choice test item are
called______________.
__________ 3. What do you call a polygon that has five sides?
3. A polygon with five sides is
called______________.
__________ 4. What is the most complex level in
Bloom’s taxonomy of the cognitive domain?
4. The most complex level in
Bloom’s taxonomy of cognitive domain is called ___________.
__________ 5. Which test item measures the greatest variety of learning
outcomes?
5. The test item that measures the greatest variety of learning outcomes is
called__________.

Advantages of a Completion or Short Answer Test


1. It covers a broad range of topics in a short period.
2. It is easier to prepare and less time consuming compared to multiple-choice and
matching types of tests.
3. It can assess effectively the lower level of Bloom’s Taxonomy. It can assess recall of
information, rather than recognition.
4. It reduces the possibility of guessing the correct answer because it requires recall
compared to true or false items and multiple-choice items.
5. It covers a greater amount of content than matching type tests.
Disadvantages of a Completion or Short Answer Test
1. It is only appropriate for questions that can be answered with short responses.
2. There is difficulty in scoring when the questions are not prepared properly and clearly.
The question should be clearly stated so that the answer of the student is clear.
3. It can assess only knowledge, comprehension, and application levels in Bloom’s
taxonomy of the cognitive domain.
4. It is not adaptable in measuring complex learning outcomes.
5. Scoring is tedious and time-consuming.

Item Analysis and Validation


Item analysis uses statistics and expert judgment to evaluate tests based on the
quality of individual items, item sets, and entire sets of items, as well as the relationship
of each item to other items. It “investigates the performance of items considered
individually either in relation to some external criterion or in relation to the remaining items
on the test” (Thompson & Levitov, 1985, p. 163).It uses this information to improve item
and test quality. Item analysis concepts are similar for norm-referenced and criterion-
referenced tests, but they differ in specific, significant ways.
10

There are two important characteristics of an item that will be of interest to the
teacher. These are: (a) item difficulty, and (b) discrimination index. We shall learn how to
measure these characteristics and apply our knowledge in making a decision about the
item in question.
The difficulty of an item or item difficulty is defined as the number of students who are
able to answer the item correctly divided by the total number of students. Thus:
Item difficulty number = of students with correct answer/ total number of students
The item difficulty is usually expressed in percentage.
Example: What is the item difficulty index of an item if 25 students are unable to answer
it correctly while 75 answered it correctly?

Here, the total number of students is 100, hence, the item difficulty index is 75/100 or
75%.
One problem with this type of difficulty index is that it may not actually indicate that
the item is difficult (or easy). A student who does not know the subject matter will naturally
be unable to answer the item correctly even if the question is easy. How do we decide on
the basis of this index whether the item is too difficult or too easy? The following arbitrary
rule is often used in the literature:

Range of Difficulty Index Interpretation Revise or discard


0 – 0.25 Difficulty Revise or discard
0.26 – 0.75 Right difficulty Retain
0.75 – above Easy Revise or discard

Difficult items tend to discriminate between those who know and those who do not
know the answer. Conversely, easy items cannot discriminate between these two groups
of students. We are therefore interested in deriving a measure that will tell us whether an
item can discriminate between these two groups of students. Such a measure is called
an index of discrimination.

An easy way to derive such a measure is to measure how difficult an item is with
respect to those in the upper 25% of the class and how difficult it is with respect to those
in the lower 25% of the class. If the upper 25% of the class found the item easy yet the
lower 25% found it difficult, then the item can discriminate properly between these two
groups. Thus:
Index of discrimination DU- DL
Example: Obtain the index of discrimination of an item if the upper 25% of the class
had a difficulty index of 0.60 (i.e. 60% of the upper 25% got the correct answer) while the
lower 25% of the class had a difficulty index of 0.20.

Here, DU = 0.60 while DL= 0.20, thus index of discrimination = .60 - .20 =.40.

Theoretically, the index of discrimination can range from -1.0 (when DU=0 and DL= 1) to
1.0 ( when DU= 1 and DL= 0). When the index of discrimination is equal to -1, then this
means that all of the lower 25% of the students got the correct answer while all of the
upper 25% got the wrong answer. In a sense, such an index discriminates correctly
between the two groups but the item itself is highly questionable. Why should the bright
11

ones get the wrong answer and the poor ones get the right answer? On the other hand, if
the index of discrimination is 1.0, then this means that all of the lower 25% failed to get
the correct answer while all of the upper 25% got the correct answer. This is a perfectly
discriminating item and is the ideal item that should be included in the test. From these
discussions, let us agree to discard or revise all items that have negative discrimination
index for although they discriminate correctly between the upper and lower 25% of the
class, the content of the item itself may be highly dubious. As in the case of the index of
difficulty, we have the following rule of thumb:

The correct response is B. Let us compute the difficulty index and index of discrimination:
Difficulty Index = no. of students getting correct response/ total
= 40/100 = 40%, within range of a “good item”
The discrimination index can similarly be computed:
DU = no. of students in upper 25% with correct response/no. of students in the upper 25%
= 15/20 = .75 or 75%
DL = no. of students in lower 75% with correct response/ no. of students in the lower 25%
= 5/20 = .25 or 25%
Discrimination Index = DU – DL = .75 - .25 = .50 or 50%.

Thus, the item also has a “good discriminating power”.

It is also instructive to note that the distracter A is not an effective distracter since this
was never selected by the students. Distracters C and D appear to have a good appeal
as distracters.

Item Difficulty
Item difficulty is the percentage of people who answer an item correctly. It is the
relative frequency with which examinees choose the correct response (Thorndike,
Cunningham, Thorndike, & Hagen, 1991). It has an index ranging from a low of 0 to a high
of +1.00.
Higher difficulty indexes indicate easier items. An item answered correctly by 75% of the
examinees has an item difficult level of .75. An item answered correctly by 35% of the
examinees has an item difficulty level of .35.
Item difficulty is a characteristic of the item and the sample that takes the test. For
example, a vocabulary question that asks for synonyms for English nouns will be easy for
American graduate students in English literature, but difficult for elementary children. Item
difficulty provides a common metric to compare items that measure different domains,
such as questions in statistics and sociology making it possible to determine if either item
is more difficult for the same group of examinees. Item difficulty has a powerful effect on
both the variability of test scores and the precision with which test scores discriminate
among groups of examinees (Thorndike, Cunningham, Thorndike, & Hagen, 1991). In
discussing procedures to determine minimum and maximum test scores, Thompson and
Levitov (1985) said that
Items tend to improve test reliability when the percentage of students who correctly
answer the item is halfway between the percentage expected to correctly answer if pure
guessing governed responses and the percentage (100%) who would correctly answer if
everyone knew the answer.
12

Index Difficulty

P =_____Ru + RL_______×100
T
Where: Ru – The number in the upper group who answered the item correctly.
RL – The number in the lower group who answered the item correctly.
T – The total number who tired the item.

Item Discriminating
Item discrimination compares the number of high scorers and low scorers who answer
an item correctly. It is the extent to which items discriminate among trainees in the high
and low groups. The total test and each item should measure the same thing. High
performers should be more likely to answer a good item correctly, and low performers
more likely to answer incorrectly. Scores range from – 1.00 to +1.00 with an ideal score
of +1.00. Positive coefficients indicate that high-scoring examinees tended to have higher
scores on the item, while a negative coefficient indicates that low-scoring students tended
to have lower scores. On items that discriminate well, more high scorers than low scorers
will answer those items correctly.
To compute item discrimination, a test is scored, scores are rank ordered, and 27 percent
of the highest and lowest scorers are selected (Kelley, 1939). The number of correct
answers in the highest 27 percent is subtracted from the number of correct answers in the
lowest 27 percent. This result is divided by the number of people in the larger of the two
groups. The percentage of 27 percent is used because “this value will maximize
differences in normal distributions while providing enough cases for analysis” (Wiersma
& Jurs, 1990, p. 145). Comparing the upper and lower groups promotes stability by
maximizing differences between the two groups. The percentage of individuals included
in the highest and lowest groups can vary. Nunnally (1972) suggested 25 percent, while
SPSS (1999) uses the highest and lowest one-third.
Wood (1960) stated that
When more students in the lower group than in the upper group select the right answer
to an item, the item actually has negative validity. Assuming that the criterion itself has
validity, the item is not only useless but is actually serving to decrease the validity of the
test.

Item Discriminating
P =_____Ru + RL_______×100
1/2 T
Where: P – percentage who answered the item correctly (index of difficulty)
R – number who answered the item correctly
T – total number who tried the item.

P = _8__×100 = 40%
20
The smaller the percentage figure the more difficult the item
13

Estimate the item discriminating power using the formula be low:


P =_____Ru + RL_____ = ___6 - 2_____ = 0.40
½T 10
The discrimination power of an item is reported as a decimal fraction; maximum
discriminating power is indicated by an index of 1.00.
Maximum discrimination is usually found at the 50 percent level of difficulty
0.00 – 0.20 = Very difficult
0.21 – 0.80 = Moderately difficult
0.81 – 1.00 = Very easy

Validation
Validity is the extent to which a test measures what it purports to measure or as
referring to the appropriateness, correctness, meaningfulness and usefulness of the
specific decisions a teacher makes based on the test results. These two definitions of
validity differ in the sense that the first definition refers to the test itself while the second
refers to the decisions made by the teacher based on the test.

Face validity estimates whether a test measures what it claims to measure. It is the extent
to which a test seems relevant, important, and interesting. It is the least rigorous measure
of validity.

Content validity is the degree to which a test matches a curriculum and accurately
measures the specific training objectives on which a program is based. Typically it uses
expert judgment of qualified experts to determine if a test is accurate, appropriate, and
fair.

Criterion-related validity measures how well a test compares with an external criterion. It
includes:

Predictive validity is the correlation between a predictor and a criterion obtained at a later
time (e.g., test score on a specific competence and caseworker performance of a job-
related tasks).

Concurrent validity is the correlation between a predictor and a criterion at the same point
in time (e.g., performance on a cognitive test related to training and scores on a Civil
Service examination).

Construct validity is the extent to which a test measures a theoretical construct (e.g., a
researcher examines a personality test to determine if the personality typologies account
for actual results).
Validity is an overall evaluative judgment, founded on empirical evidence and
theoretical rationales, of the adequacy and appropriateness of inferences and actions
based on test scores. As such validity is an inductive summary of both the adequacy of
existing evidence for and the appropriateness of potential consequences of test
interpretation and use (Messick, 1988, pp. 33-34).
14

What you have read is just a few information about development of assessment
tools. Let’s us put this information into a meaningful learning experience.

Find the index of difficulty in each of the following situations:


1. N = 60, number of wrong answers : upper 25% = 2 lower 25% = 6
Let’s do and
discover 2. N = 80, number of wrong answers: upper 25% = 2 lower 25% = 9

3. N = 30, number of wrong answers: upper 25% = 1 lower 25% = 6

4. N = 50, number of wrong answers: upper 25% = 3 lower 25% = 8

5. N = 70, number of wrong answers, upper 25% = 4 lower 25% = 10

Accomplished the Worksheet No.1 Development of Assessment Tools


The activity will evaluate if you will understood the lesson.

The following criteria may be considered in checking the output:


1. Content of the answers is well organize
How much you
2. Correctness in analization
have learned? 3. Timeliness of the submission
4. Technicalities (spelling, punctuation mark etc...)

Development of an assessment tools will assured that the main objective a teacher
will achieve because of identification and validation of effective strategy in assessing
students learning outcomes. Assessment tools are coherent system in which varied
Summary assessment strategies in validating the students learning outcomes and make sure
everything is fair and valid result. Assessments is to provide information about the
effectiveness of instruction and the overall progress of students’ science learning using
differentiated strategy in validating.
 March, Colin. Teaching Social Studies. National Library of Australia, Prentice-Hall of
Australia
Required  Calmorin L. P. Measurement and Evaluation, Third Edition. National Book store.
Readings Mandaluyong City 1550.
 Calmorin L. P. (2011), Assessment of Student Learning 1.Firts Edition. Rex Book
Store, Inc

 March, Colin. Teaching Social Studies. National Library of Australia, Prentice-Hall of


Australia
 Calmorin L. P. Measurement and Evaluation, Third Edition. National Book store.
References Mandaluyong City 1550.
 Calmorin L. P. (2011), Assessment of Student Learning 1.Firts Edition. Rex Book
Store, Inc
 ItemAnalysis07.pmd
15

Feedback
16

Development of
Assessment Tools

Answer the following questions:

1. A teacher constructed a test which would measure the student's ability to apply
previous knowledge to certain situations. In particular, the evidence that a student is
able to apply previous knowledge are:

Draw correct conclusions that are based on the information given; Identify one or more
logical implications that follow from a given point of view; State whether two ideas are
identical, just similar,, unrelated or contradictory. Write test items using the multiple
choice type of test that would cover these concerns of the teacher. Show your test to
an expert and ask him to judge whether the items indeed cover these concerns. (10
points)

2. What is an expectancy table? Describe the process of constructing an expectancy


table. When do we use an expectancy table? (10 points)

3. Enumerate the three types of validity evidences. Which of these types of validity is
the most difficult to measure? Why? (10 points)

You might also like