You are on page 1of 14

UNIVERSITY OF NORTHEASTERN PHILIPPINES

SCHOOL OF GRADUATE STUDIES


IRIGA CITY

EDUCATION 203: TEST CONSTUCTION AND EVALUATION OF CURRICULUM

MIDTERM EXAMINATION
GROUP 3
NAME PERMIT NO.
1. Jonna Alexa SA, Lagarde 0200
2. Madonna O. Abante 0313
3. Diana H. Salvino 0349
4. Maria Editha V. Ferolino
5. Ruby V. Calatrava 0267
6. April Rose B. Abril
7. Hilario O. Olayon Jr. 0270
8. Julie C. Magante 0317
9. Ruby Rose V. Bisuna
10. Joval N. Batanes
11. Rizza A. Nares
12. Ely F. Panliboton
13. Mary Joy S. Acuna
14. Christine V. Berja

1. What is measurement? Differentiate it from Evaluation.

Measurement can be defined as the process by which information about the attributes or
characteristic of things are determined and differentiated. It also involves gathering of data while
Evaluation is a process of summing up the results of measurements on tests, giving them some
meaning based on the value judgement. In addition, evaluation is much more comprehensive and
inclusive than measurement.

2. Why is evaluation considered integral part of teaching learning process?


Evaluation is a broader term than ‘test’. It includes all types and examinations in it. Its
purpose is not only to check the knowledge of the learner. But all the aspects of the learner.

1. It helps in forming the values of judgement, educational status, or achievement of student.

Evaluation in one form or the other is inevitable in teaching-learning, as in all fields of activity of

education judgements need to be made.

2. It contributes to formulation of objectives, designing of learning experiences and assessment


of learner performance.
3. It is very useful to bring improvement in teaching and curriculum.
4. It provides accountability to the society, parents, and to the education system.
5. It provides feedback to the teachers about their teaching and the learners about their learning.
3. How a criterion-referenced test does differ from a norm-referenced test?

Norm-referenced is a type of test that assesses the test taker’s ability and performance
against other test takers. It could also include a group of test takers against another group
of test takers. This is done to differentiate high and low achievers. The test’s content
covers a broad area of topics that the test takers are expected to know and the difficulty of
the content varies. This test must also be administered in a standardized format. Norm-
referenced test helps determine the position of the test taker in a predefined population.

Criterion-Reference is a type of test that assesses the test taker’s ability to understand a
set curriculum. In this test, a curriculum is set in the beginning of the class, which is then
explained by the instructor. At the end of the lesson, the test is used to determine how
much did the test taker understand. This test is commonly used to measure the level of
understanding of a test taker before and after an instruction is given. It can also be used to
determine how good the instructor is at teaching the students. The test must have material
that is covered in the class by the instructor

 
Norm-Referenced Criterion-Reference

Norm-Referenced tests Criterion-Reference tests


measure the performance of measure the performance of
Definition one group of test takers test takers against the
against another group of criteria covered in the
test takers. curriculum.

To measure how much the


To measure how much a
test taker known before and
Purpose test taker knows compared
after the instruction is
to another student.
finished.

Norm-Referenced tests Criterion-Reference tests


measure broad skill areas measure the skills the test
Content
taken from a variety of taker has acquired on
textbooks and syllabi. finishing a curriculum.

Each skill is tested by at


Each skill is tested by less
least four items to obtain an
Item characteristics than four items. The items
adequate sample of the
vary in difficulty.
student.

4. What are the steps of constructing of Test-Made tests?

STEPS IN CONSTRUCTING A TEST

  Planning of Test
  Preparation of Test Design
  Preparation of the blueprint
  Writing of Items
  Preparation of Scoring Key and Marking Scheme

  PLANNING of a TEST  - Determining the Objectives of the Test, Determining the maximum time
allotment and maximum marks

  PREPARATION of DESIGN for the TEST  - Important Factors to Consider in Design


        
    Weight to Objectives
            Weight to Content
            Weight to Form of Questions
            Weight to Difficulty Level

 Weight to Objectives  - Indicates what objectives are to be tested and what weight has to be
given per objective

No Objectives Marks Percentage


1 Knowledge 3 12%
2 Understanding 2 8%
3 Application 6 24%
4 Analysis 8 32%
5 Synthesis 4 16%
6 Evaluation 2 8%
25 100%

 Weight to Content  - Indicates the various aspects of the content to be tested and the weight
to be given

No Contents Topic Total Marks Percentage

1 Topic 1 15 60%
2 Topic 2 10 40%
25 100%

 Weight to Form of Questions - Indicates the forms of questions to be included in the test and their weight

No Forms of Questions Marks Percentage


1 Objective Type 7 28%
2 Short Answer 14 66%
3 Essay Type 4 6%
25 100%

 Weight to Difficulty Level - Indicates weight to be given to different levels of questions

No Level of Difficulty Marks Percentage

1 Easy 5 20%
2 Average 15 60%
3 Difficult 5 20%
25 100%
BLUEPRINT

  Blueprint is a map and a specification for an assessment program that ensures that all aspects
of the curriculum and educational domains are covered by the assessment programs over a
specified period.
  The term “blueprint” is derived from the domain of architecture which means “detailed plan of
action.
  In simple terms, a blueprint links assessment to learning objectives. It also indicates the marks
carried by each question. It is useful to prepare a blueprint so that the faculty who sets question
paper knows which question will test which objective, which content unit, and how many marks
it would carry.

GUIDELINE / STEPS TO PREPARE BLUEPRINT


  Content analysis
  Determination of learning objectives
  Determination of no. of items for each topic based on learning objectives
  Determining the types of questions
CONTENT ANALYSIS

  It is a means to divide the whole content of the syllabus or course into a systematic tabular form

Unit Sub-unit

Lesson-1 Topic 1.1


Topic1.2
Topic 1.3
Topic 1.4
Lesson- 2 Topic 2.1
Topic2.2
Topic2.3

DETERMINATION OF LEARNING OBJECTIVE

  Learning the objective is based on bloom’s taxonomy


  Knowledge level
  Understanding level
  Application-level

understandin applicatio
Sub-topic knowledge
g n
Topic 1.1 2 1 1
Topic 1.2 2 2 1
Topic 1.3 2 1 0

DETERMINATION OF NO. OF ITEMS FOR EACH TOPIC


unit Sub-topic knowledge understanding application Total items
Lesson-1 Topic 1.1 2 2 1 5 items
Topic 1.2 2 2 1 5 items
Topic 1.3 2 1 0 3 items
6 items 5 items 2 items 13 items

In this way, no. of items for another lesson could be constructed \


unit Sub- knowledge understandin applicatio Total items
topic g n
Lesson- Topic 1.1 2 2 1 5 items
1
Topic 1.2 2 2 1 5 items
Topic 1.3 2 1 0 3 items
Lesson- Topic 2.1 3 2 1 6 items
2
Topic2.2 2 1 1 4 items
Topic2.3 1 2 1 4 items
Topic 2.4 1 1 1 3 items
Lesson- Topic 3.1 2 1 1 4 items
3
Topic 3.2 2 1 0 3 items
total 17 items 13 items 7 items 37 items

WRITING OF ITEMS

        The test writer writes items according to the blueprint.


        The difficulty level has to be followed
        It should be checked whether all items can be answered within the time set.
        Arrange questions in the order of difficulty

PREPARATION OF SCORING KEY


Mark
No Answers
s
1 A 1pt
2 B 1pt
3 C 1pt

5. Discuss how to conduct item analysis.

To be able to conduct a good item analysis, we need to fill up this table and by following
the given procedures.
No. of pupils tested: _____

Remarks
Item Number Upper 27% Lower 27% Difficulty Discrimination *Rejected
Index Index ** Retained
*** Revised
1
2
3
4
5
PROCEDURES
a. Arrange the test papers from highest to lowest.
b. Compute for the upper 27% and lower 27%. (the number of test papers in the upper 27% is the
same with the lower 27%)
c. From the upper 27 %, identify/count the number of correct responses in each item or question.
d. Do the same procedure for the lower 27%, identify/count the number of correct responses in
each item or question.
e. Compute for the difficulty index by using any of these formula;

Difficulty Index = no. of correct responses (upper 27 %) + no. of correct responses (lower 27%)
_____Upper 27% Lower 27 %________________
2

Difficulty Index = no. of correct responses (upper 27%) + no of correct responses (lower 27%)
Upper 27% + Lower 27%

f. Then, compute for the discrimination index.


Discrimination Index = no. of correct responses (upper 27 %) - no. of correct responses (lower 27%)
Upper 27% Lower 27
Discrimination Index= no. of correct responses (upper 27%) - no of correct responses (lower 27%)
(1/2)Upper 27% + (1/2) Lower 27%

g. From the given descriptions on the result of the difficulty and discrimination index, decide
whether the question be retained, revised or rejected.

6. What is the importance of TOS?

A Table of Specifications is a two-way chart which describes the topics to be covered in a


test and the number of items or points which will be associated with each topic. Sometimes the
types of items are described as well.

The purpose of a Table of Specifications is to identify the achievement domains being


measured and to ensure that a fair and representative sample of questions appear on the test.

As it is impossible, in a test, to assess every topic from every aspect, a Table of


Specifications allows us to ensure that our test focuses on the most important areas and weights
different areas based on their importance / time spent teaching. A Table of Specifications also
gives us the proof we need to make sure our test has content validity.

Tables of Specifications are designed based on:

 course objectives
 topics covered in class
 amount of time spent on those topics
 textbook chapter topics
 emphasis and space provided in the text

A Table of Specification could be designed in 3 simple steps:

1. identify the domain that is to be assessed


2. break the domain into levels (e.g. knowledge, comprehension, application …)

3. construct the table


The test specifications for an exam program provide essential planning materials for the
test development process. Thorough, thoughtful test specifications can guide the remainder of the
test development process, especially item writing efforts and test assembly.
The TOS can help teachers map the amount of class time spent on each objective with the
cognitive level at which each objective was taught thereby helping teachers to identify the types
of items they need to include on their tests.
A Table of Specifications allows the teacher to construct a test which focuses on the key
areas and weights those different areas based on their importance. A Table of Specifications
provides the teacher with evidence that a test has content validity, that it covers what should be
covered.

7. What is Difficulty index / Discrimination Index?

Difficulty Index is a measure used to determine the difficulty level of the test item. It
helps the teacher to calculate the proportion of the students who answered the test item
accurately.
Discrimination Index, refers to how well an assessment differentiates between high and
low scorers. In other words, teacher should be able to expect that the high-performing students
would select the correct answer for each question more often than the low-performing students.
If this is true, then the assessment is said to have a positive discrimination index (between 0 and
1) -- indicating that students who received a high total score chose the correct answer for a
specific item more often than the students who had a lower overall score. If, however, the teacher
finds that more of the low-performing students got a specific item correct, then the item has a
negative discrimination index (between -1 and 0).
8. Discuss the steps in the U-L methods of item analysis.

1. Award of a score to each student

2. Ranking in order of merit

3. Identification of groups: high and low

4. Calculation of the difficulty index of a question

5. Calculation of the discrimination index of a question

6. Critical evaluation of each question enabling a given question to be retained, revised or


rejected

1. Award of a score to each student

4.78

A practical, simple and rapid method is to perforate on your answer sheet* the boxes
corresponding to the correct answer. By placing the perforated sheet on the student's answer
sheeta the raw score (number of correct answers) can be found almost automatically.

* This format corresponds to a multiple-choice question test. However, the “item analyses”
technique can also be used for other types of assessment instrument.
Figure

2. Ranking in order of merit

4.79

Assuming that the scores of 21 students have been obtained (alphabetical list on the left), this
step consists merely in ranking (listing) students in order of merit (in relation to the score)
proceeding from the highest to the lowest score. Let us assume the list as under A and then rank
the students to obtain distribution B, ranging from 4 to 27.

A
Student Score
Albert 7
Alfred 13
Andrew 19
Ann 25
Brian 16
Christine 19
Elise 17
Emily 24
Felicity 16
Frances 14
Frank 26
Fred 17
Harriet 11
Ian 17
John 14
Jennifer 21
Margaret 16
Michael 9
Patrick 27
Peter 4
Philip 16
B
Order Student Score
1 Patrick 27
2 Frank 26
3 Ann 25
4 Emily 24
5 Jennifer 21
6 Christine 19
6 Andrew 19
8 Elise 17
8 Ian 17
8 Fred 17
11 Brian 16
11 Felicity 16
11 Margaret 16
11 Philip 16
15 Frances 14
15 John 14
17 Alfred 13
18 Harriet 11
19 Michael 9
20 Albert 7
21 Peter 4

3. Identification of high and low groups

4.80

Ebel1 suggests the formation of “high” and “low” groups comprising only the first 27% (high
group) and the last 27% (low group) of all the students ranked in order of merit.
1
 Ebel, R.L. (1965) Measuring educational achievement, Prentice Hall, pp. 348 - 349.

Why 27%? Because 27% gives the best compromise between two desirable but contradictory
aims:

1. making both groups as large as possible;


2. making the two groups as different as possible.

Truman Kelley showed in 1939 that when each group consists of 27% of the total it can be said
with the highest degree of certainty that those in the high group are really superior (with respect
to the quality measured by the test) to those in the low group. If a figure of 10% were taken, the
difference between the two means of the competence of the two groups would be greater but the
groups would be much smaller and there would be less certainty regarding their mean level of
performance.

Similarly, if a figure of 50% was taken the two groups would be of maximum size but since the
basis of our ranking is not absolutely accurate, certain students in the high group would really
belong to the low group, and vice versa.
While the choice of 27% is the best, it is, however, not really preferable to 25% or 33%; and if it
is preferred to work with 1/4 or 1/3 rather than with the somewhat odd figure of 27% there is no
great disadvantage in so doing.

For the rest of our analysis we shall use 33%.

4. Calculation of the difficulty index of a question

Difficulty index

Index for measuring the easiness or difficulty of a test question. It is the percentage (%) of
students who have correctly answered a test question; it would be more logical to call it the
easiness index. It can vary from 0 to 100%.

Calculation

The following formula is used:

where H = number of correct answers in the high


group
L = number of correct answers in the low group

N = total number of students in both groups

(Do the exercise on page 4.83.)

5. Calculation of the discrimination index of a question

Discrimination index

An indicator showing how significantly a question discriminates between “high” and “low”
students.

It varies from -1 to +1.

Calculation

The following formula is used:

(Do the exercise on page 4.83.)

6. Critical evaluation of a question

4.81

This is based on the indexes obtained.

Difficulty index: the higher this index the easier the question; it is thus an illogical term. It is


sometimes called “easiness index”, but in the American literature it is always called “difficulty
index”.
In principle, a question with a difficulty index lying between 30% and 70% 1 is acceptable (in that
range, the discrimination index is more likely to be high).
1
 Some authors give values between 35% and 85%.

If for a test you use a group of questions with indexes in the range 30% - 70%, then the mean
index will be around 50%. It has been shown that a test with a difficulty index in the range of
50% - 60% is very likely to be reliable as regards its internal consistency or homogeneity.

Discrimination index: the higher the index the more a question will distinguish (for a given
group of students) between “high” and “low” students. When a test is composed of questions
with high discrimination indexes, it ensures a ranking that clearly discriminates between the
students according to their level of performance, i.e., it gives no advantage to the low group over
the high group. In other words, it helps you to find out who are the best students.

It is most useful in preparing your question bank. Using the index 2, you can judge questions as
follows:

0.35 and : Excellent question


over
0.25 to 0.34 : Good question
0.15 to 0.24 : Marginal question - revise
under 0.15 : Poor question - most likely
discard
2
 Remember that the index has an indicative rather than an absolute value.

Uses of indexes aim: review of questions

4.82

9. What is validity? How is validity established?


VALIDITY - is the extent to which a test measures what it claims to measure. It is vital for a test to be valid
in order for the results to be accurately applied and interpreted.
ESTABLISHING THE TEST VALIDITY
Validity
 the degree to which the test is capable of achieving certain aims.
 it is sometimes defined as truthfulness.
“In order for a test to be valid or truthful, it must first of all be reliable.”
KINDS OF VALIDITY
1. CONTENT VALIDITY
is related to how adequately the content o the test samples the domain about which
inferences is to be made. it is particularly important for achievement test.
2. . CRITERION-RELATED VALIDITY
it pertains to the empirical technique studying the relationships between the test scores
and some independent external measures ( criteria)
Two Kind of Criterion -Related Validity
A. Concurrent Validity
When they are collected at approximately the same time as the test data, we speak
concurrent validity.
B. Predictive Validity
When the are gathered at the later date, we have a measure of predictive validity.
3. CONSTRUCT VALIDITY
is the degree to which the test score can be accounted for by certain explanatory construct
in a psychological theory.
METHODS OF EXPRESSING VALIDITY
The most common procedures used in expressing validity are the PEARSON
PRODUCT-MOMENT correlations COEFFECIENT or (r) the SPEARMAN-BROWN
PROFECY FORMULA, the KUDER-RICHARDSON ESTIMATES (such as K-R20 and
K-R21).

10. What is Reliability? How can it be established?


Reliability refers to how consistently a method measure something. If the same result can
be consistently achieved by using the same methods under the same circumstances, the
measurement is considered reliable.

High reliability is one indicator that a measurement is valid. If a method is not reliable, it
probably isn’t valid.

Types of Reliability
Different types of reliability can be estimated through various statistical methods.

Test-retest
The consistency of a measure across time: do you get the same results when you repeat the
measurement?
A group of participants complete a questionnaire designed to measure personality traits. If they
repeat the questionnaire days, weeks or months apart and give the same answers, this indicates
high test-retest reliability.
Interrater
The consistency of a measure across raters or observers: do you get the same results when
different people conduct the same measurement?
Based on an assessment criteria checklist, five examiners submit substantially different results
for the same student project. This indicates that the assessment checklist has low inter-rater
reliability (for example, because the criteria are too subjective).

Internal Consistency
The consistency of the measurement itself: do you get the same results from different parts of a
test that are designed to measure the same thing?
You design a questionnaire to measure self-esteem. If you randomly split the results into two
halves, there should be a strong correlation between the two sets of results. If the two results are
very different, this indicates low internal consistency.

To established reliability, reliability should be considered throughout the data collection process.
When you use a tool or technique to collect data, it’s important that the results are precise, stable
and reproducible.

 Apply your methods consistently

Plan your method carefully to make sure you carry out the same steps in the same way for each
measurement. This is especially important if multiple researchers are involved.

For example, if you are conducting interviews or observations, clearly define how specific
behaviors or responses will be counted, and make sure questions are phrased the same way each
time.

 Standardize the conditions of your research

When you collect your data, keep the circumstances as consistent as possible to reduce the
influence of external factors that might create variation in the results. For example, in an
experimental setup, make sure all participants are given the same information and tested under
the same condition.

II. Complete this Table of Specification.

Budget of Domain Item


Contents work/time % No. of Cog. Aff. Psycho. Total Placement Remarks
allotment Items
I 7 28 14 5 5 4 14 1-14
II 6 24 12 4 4 4 12 15-26
III 4 16 8 3 3 2 8 27-34
IV 5 20 10 4 3 3 10 35-44
V 3 12 6 2 2 2 6 45-50
TOTAL 25 100 50 50

III. Find the difficulty index and the Discrimination Index of these test Items.
No. of pupils tested: 60

Remarks
Item Upper 27% Lower 27% Difficulty Discrimination *Rejected
Number Index Index ** Retained
*** Revised
1 13 6 0.59 0.44 **
2 16 10 0.81 0.38 **
3 15 7 0.69 0.5 **
4 14 5 0.59 0.56 **
5 10 10 0.63 0 *
6 12 6 0.56 0.38 **
7 9 12 0.66 -0.19 *
8 11 5 0.5 0.38 **
9 6 3 0.28 0.19 *
10 8 4 0.38 0.25 ***
11 5 2 0.22 0.19 *

Difficulty index Index of Discrimination


0-.20 – Very Difficult Below 0.1 - poor
.21-.80 – Average 0.20-0.29 – marginal item
.82-1.0 – Very easy 0.30 – 0.39 – good
0.40 – up – very good item

You might also like