You are on page 1of 8

Table of specification

What is a Table of Specifications?

A TOS, sometimes called a test blueprint, is a table that helps teachers align objectives, instruction, and assessment
(e.g., Notar, Zuelke, Wilson, & Yunker, 2004).

This strategy can be used for a variety of assessment methods but is most commonly associated with constructing
traditional summative tests.

What is a Table of Specifications?

(TOS) A two way chart that relates the learning outcomes to the course content

It enables the teacher to prepare a test containing a representative sample of student knowledge in each of the areas
tested

Step 1- Determine the coverage of your exam 

The first rule in making exams.

In making a document called table of specification make sure that the coverage of your exam is something that you
have satisfactorily taught in class. Select the topics that you wish to test in the exam. It is possible that you will not be
able to cover all these topics as it might create a test that is too long and will not be realistic for your students in the
given time. So select only the most important topics. 

Step 2- Determine your testing objectives for each topic area 

In this step, you will need to be familiar with bloom’s taxonomy of thinking skills. Bloom has identified the hierarchy of
learning objectives, from the lower thinking skills of remembering and understanding to the higher thinking skills of
evaluating and creating. 

Bloom’s Taxonomy has six categories: (starting from lower level to highest) - (1) Remembering, (2) Understanding, (3)
Applying, (4) Analyzing, (5) Evaluating and (6) Creating

Step 2- Determine your testing objectives for each topic area 

Bloom’s Taxonomy has six categories: (starting from lower level to highest) - (1) Remembering, (2) Understanding, (3)
Applying, (4) Analyzing, (5) Evaluating and (6) Creating

So for each content area that you wish to test, you will have to determine how you will test each area. Will you test
simply their recall of knowledge? Or will you be testing their comprehension of the matter? Or perhaps you will be
challenging them to analyze and compare and contrast something. Again, this would depend on your instructional
objectives in the classroom. Did you teach them lower thinking skills or did you challenge them by making them think
critically?

Step 3- Determine the duration for each content area

The next step in making the table of specifications is to write down how long you spent teaching a particular topic. This
is important because it will determine how many points you should devote for each topic.

Logically, the longer time you spent on teaching a material, then the more questions should be devoted for that area. 

Topic: Experiments, Outcomes, Sample space, and Events

Competency: Describes an experiment, outcome, sample space and events.

Example :

Total teaching hours for the third quarter = 25 hours


Teaching hours for the topic = 3 hours
3
Percentage =.12× 100=12 %
25
Topic: Experiments, Outcomes, Sample space, and Events
Describes an experiment, outcome, sample space and events.
3
Percentage =.12× 100=12 %
25
Number of items.
Total number of items = 50 items
Number of items
0.12 ×50=6 6 items

Step 4- Determine the Test Types for each objective

Now that you have created your table of specifications for your test by aligning your objectives to bloom’s taxonomy, it’s
time to determine the test types that will accomplish your testing objectives. For example, remembering questions can
be accomplished easily through multiple choice questions or matching type exams.

Step 5- Polish your terms of specification

After your initial draft of the table of specifications, it’s time to polish it. Make sure that you have covered in your terms
of specification the important topics that you wish to test. The number of items for your test should be sufficient for the
time allotted for the test. You should seek your academic coordinator and have them comment on your table of
specification. They will be able to give good feedback on how you can improve or modify it.
Item analysis and validation
 The teacher normally prepares a draft of the test. Such a draft is subjected to item analysis and validation in order to ensure
that the final version of the test would be useful and functional.
 First, the teacher tries out the draft test to a group of students of similar characteristics as the intended test takers (try-out
phase).
 From the try-out group, each item will be analyzed in terms of its ability to discriminate between those who know and those
who do not know and also its level of difficulty (item analysis phase).
 The item analysis will provide information that will allow the teacher to decide whether to revise or replace an item (item
revision phase). 
 Then, finally, the final draft of the test is subjected to validation if the intent is to make use of the test as a standard test for
the particular unit or grading period. 
TWO IMPORTANT CHARACTERISTICS OF AN ITEM
(a) item difficulty, and
(b) discrimination index.

What is the Discrimination index?


 The discrimination index is a basic measure of the validity of an item. It is a measure of an item's ability to
discriminate between those who scored high on the total test and those who scored low.
 Though there are several steps in its calculation, once computed, this index can be interpreted as an indication of
the extent to which overall knowledge of the content area or mastery of the skills is related to the response on an
item.
 Perhaps the most crucial validity standard for a test item is that whether a student got an item correct or not is due
to their level of knowledge or ability and not due to something else such as chance or test bias.
 An easy way to derive such a measure is to measure how difficult an item is with respect to those in the upper 25%
of the class and how difficult it is with respect to those in the lower 25% of the class. If the upper 25% of the class
found the item easy yet the lower 25% of the class found it difficult, then the item can discriminate properly
between these two groups.
 Example: Obtain the index of discrimination of an item if the upper 25% of the class had a difficulty index of 0.60
(i.e. 60% of the upper 25% got the correct answer) while the lower 25% of the class had a difficulty index of 0.20.
 DU = 0.60 while DL = 0.20,
 thus index of discrimination = .60 -.20 = .40.
 Difficulty index
 The correct response is B. Let us compute the difficulty index and index of discrimination
No. of students getting correct response
 Difficulty Index=
total
40

Difficulty Index=
80 ¿
¿
 Difficulty Index=50 %

 Difficulty Index=50 %
 Right difficulty
 Retain

 The discrimination index can similarly be computed:


 Discrimination Index=DU −DL
No. of students ∈theupper 25 % with correct reponse
 DU =
No . of students ∈theupper 25 %
15
 DU = =0.75∨75 %
20

 The discrimination index can similarly be computed:


No. of students ∈theupper 25 % with correct reponse
 DU =
No . of students ∈theupper 25 %
15
 DU = =0.75∨75 %
20
No . of students ∈thelower 75 % withcorrect response
 DL=
No . of students ∈the lower 25 %
5
 DL= =0.25∨25 %
20
 Discrimination Index=DU −DL=0.75−0.25=0.50∨50 %
 Thus, the item has “good discriminating power”.

 Discrimination Index=DU −DL=0.75−0.25=0.50∨50 %


 Thus, the item has “good discriminating power”.

 The item-analysis procedure for norm-provides the following information


 The difficulty of the item
 The discriminating power of the item
 The effectiveness of each alternative
 Benefits derived from item analysis
 It provides useful information for class discussion of test
 It provides data which helps improve their leaning
 It provides insights and skills that lead to preparation of better tests in the future
 Index of Difficulty
Ru+ Rl
 P= ×100
T
 It is also instructive to note that the distracter A is not an effective distracter since this was never selected by the
students. Distracters C and D appear to have a good appeal as distracters.
Where
 Ru – the number in the upper group who answered the items correctly.
 Rl - the number in the lower group who answered the items correctly.
 T – The total number who tried the item
 The discriminating power of an item is reported as decimal fraction; maximum discriminating power is indicated by
an index of 1.00.
 Maximum discrimination is usually found at the 50 percent level of difficulty
0.00 – 0.20 Very difficult
0.21- 0.80 moderately difficult
0.81-1 Very easy
validation
 Validity is the extent to which a test measures what it purports to measure or as referring to the appropriateness,
correctness, meaningfulness and usefulness of the specific decisions a teacher makes based on the test results.
These two definitions of validity differ in the sense that the first definition refers to the test itself while the second
refers to the decisions made by the teacher based on the test. A test is valid when it is aligned to the learning
outcome.
 A teacher who conducts test validation might want to gather different kinds of evidence. There are essentially three
main types of evidence that may be collected: content-related evidence of validity, criterion-related evidence of
validity and construct-related evidence of validity.
 Content-related evidence of validity
 Content-related evidence of validity refers to the content and format of the instrument.
 How appropriate is the content?
 How comprehensive?
 Does it logically get at the intended variable?
 How adequately does the sample of items or questions represent the content to be assessed?
 Criterion-related evidence of validity
 Criterion-related evidence of validity refers to the relationship between scores obtained using the instrument and
scores obtained using one or more other tests (often called criterion).
 How strong is this relationship?
 How well do such scores estimate present or predict future performance of a certain type?
 Construct-related evidence of validity
 Construct-related evidence of validity refers to the nature of the psychological construct or characteristic being
measured by the test.
 How well does a measure of the construct explain differences in the behavior of the individuals or their
performance on a certain task?
 The usual procedure for determining content validity may be described as follows: The teacher writes out the
objectives of the test based on the table of specifications and then gives these together with the test to at least two
(2) experts along with a description of the intended test takers. The experts look at the objectives, read over the
items in the test and place a check mark in front of each question or item that they feel does not measure one or
more objectives.
 They also place a check mark in front of each objective not assessed by any item in the test. The teacher then
rewrites any item so checked and resubmits to the experts and/or writes new items to cover those objectives not
heretofore covered by the existing test. This continues until the experts approve of all items and also until the
experts agree that all of the objectives are sufficiently covered by the test.
 In order to obtain evidence of criterion-related validity, the teacher usually compares scores on the test in question
with the scores on some other independent criterion test which presumably has already high validity. For example,
if a test is designed to measure mathematics ability of students and it correlates highly with a standardized
mathematics achievement test (external criterion), then we say we have high criterion-related evidence of validity.
In particular, this type of criterion-related validity is called its concurrent validity.
 Another type of criterion-related validity is called predictive validity wherein the test scores in the instrument are
correlated with scores on a later performance (criterion measure) of the students. For example, the mathematics
ability test constructed by the teacher may be correlated with their later performance in a Division wide
mathematics achievement test.
 Apart from the use of correlation coefficient in measuring criterion-related validity, Gronlund suggested using the
so-called expectancy table. This table is easy to construct and consists of the test (predictor) categories listed on the
left hand side and the criterion categories listed horizontally along the top of the chart. For example, suppose that a
mathematics achievement test is constructed and the scores are categorized as high, average, and low.
 The criterion measure used is the final average grades of the students in high school: Very Good, Good, and Needs
Improvement. The two way table lists down the number of students failing each possible pairs of (test, grades as
shown below)
 The expectancy table shows that there were 20 students getting high test scores and subsequently rated excellent
in terms of their final grades; 25 students got average scores and subsequently rated good in their finals; and finally,
14 students obtained low test scores and were later graded as needing improvement. The evidence for this
particular test tends to indicate that students getting high scores on it would be graded excellent; average scores on
it would be rated good later; and students getting low scores on the test would be graded as needing improvement
later.
 reliability
 Reliability refers to the consistency of the scores obtained - how consistent they are for each individual from one
administration of an instrument to another and from one set of items to another. We already gave the formula for
computing the reliability of a test: for internal consistency; for instance, we could use the split-half method or the
Kuder-Richardson formulae (KR-20 or KR-21)
 Reliability and validity are related concepts. If an instrument is unreliable, it cannot yet valid outcomes. As reliability
improves, validity may improve (or it may not). However, if an instrument is shown scientifically to be valid then it is
almost certain that it is also reliable
 activity
 Find the index of difficulty of the following situations:
 1. N=60; upper 25%= 2 lower 25%=6
 2. N=80; upper 25%= 2 lower 25%=9
 3. N=30; upper 25%= 1 lower 25%=6
 4. N=50; upper 25%= 3 lower 25%=8
 5. N=70; upper 25%= 4 lower 25%=10
 Example
 #1
 N=60; upper 25%= 2 lower 25%=6
Ru+ Rl
 Formula; P= ×100
T
 Ru = 2 and Rl = 6
 Find T
 T ¿ N ×Upper 25 % + N ×lower 25 %
 T ¿ 60 ×0.25+ 60× 0.25
 T ¿ 15+15=30
 B. Which of the items in Exercise A are found to be most difficult?

You might also like