Professional Documents
Culture Documents
MIDTERM EXAMINATION
GROUP 3
NAME PERMIT NO.
1. Jonna Alexa SA, Lagarde 0200
2. Madonna O. Abante 0313
3. Diana H. Salvino 0349
4. Maria Editha V. Ferolino
5. Ruby V. Calatrava 0267
6. April Rose B. Abril
7. Hilario O. Olayon Jr. 0270
8. Julie C. Magante 0317
9. Ruby Rose V. Bisuna
10. Joval N. Batanes
11. Rizza A. Nares
12. Ely F. Panliboton
13. Mary Joy S. Acuna
14. Christine V. Berja
Measurement can be defined as the process by which information about the attributes or
characteristic of things are determined and differentiated. It also involves gathering of data while
Evaluation is a process of summing up the results of measurements on tests, giving them some
meaning based on the value judgement. In addition, evaluation is much more comprehensive and
inclusive than measurement.
Evaluation in one form or the other is inevitable in teaching-learning, as in all fields of activity of
Norm-referenced is a type of test that assesses the test taker’s ability and performance
against other test takers. It could also include a group of test takers against another group
of test takers. This is done to differentiate high and low achievers. The test’s content
covers a broad area of topics that the test takers are expected to know and the difficulty of
the content varies. This test must also be administered in a standardized format. Norm-
referenced test helps determine the position of the test taker in a predefined population.
Criterion-Reference is a type of test that assesses the test taker’s ability to understand a
set curriculum. In this test, a curriculum is set in the beginning of the class, which is then
explained by the instructor. At the end of the lesson, the test is used to determine how
much did the test taker understand. This test is commonly used to measure the level of
understanding of a test taker before and after an instruction is given. It can also be used to
determine how good the instructor is at teaching the students. The test must have material
that is covered in the class by the instructor
Norm-Referenced Criterion-Reference
Planning of Test
Preparation of Test Design
Preparation of the blueprint
Writing of Items
Preparation of Scoring Key and Marking Scheme
PLANNING of a TEST - Determining the Objectives of the Test, Determining the maximum time
allotment and maximum marks
Weight to Objectives - Indicates what objectives are to be tested and what weight has to be
given per objective
Weight to Content - Indicates the various aspects of the content to be tested and the weight
to be given
1 Topic 1 15 60%
2 Topic 2 10 40%
25 100%
Weight to Form of Questions - Indicates the forms of questions to be included in the test and their weight
1 Easy 5 20%
2 Average 15 60%
3 Difficult 5 20%
25 100%
BLUEPRINT
Blueprint is a map and a specification for an assessment program that ensures that all aspects
of the curriculum and educational domains are covered by the assessment programs over a
specified period.
The term “blueprint” is derived from the domain of architecture which means “detailed plan of
action.
In simple terms, a blueprint links assessment to learning objectives. It also indicates the marks
carried by each question. It is useful to prepare a blueprint so that the faculty who sets question
paper knows which question will test which objective, which content unit, and how many marks
it would carry.
It is a means to divide the whole content of the syllabus or course into a systematic tabular form
Unit Sub-unit
understandin applicatio
Sub-topic knowledge
g n
Topic 1.1 2 1 1
Topic 1.2 2 2 1
Topic 1.3 2 1 0
WRITING OF ITEMS
To be able to conduct a good item analysis, we need to fill up this table and by following
the given procedures.
No. of pupils tested: _____
Remarks
Item Number Upper 27% Lower 27% Difficulty Discrimination *Rejected
Index Index ** Retained
*** Revised
1
2
3
4
5
PROCEDURES
a. Arrange the test papers from highest to lowest.
b. Compute for the upper 27% and lower 27%. (the number of test papers in the upper 27% is the
same with the lower 27%)
c. From the upper 27 %, identify/count the number of correct responses in each item or question.
d. Do the same procedure for the lower 27%, identify/count the number of correct responses in
each item or question.
e. Compute for the difficulty index by using any of these formula;
Difficulty Index = no. of correct responses (upper 27 %) + no. of correct responses (lower 27%)
_____Upper 27% Lower 27 %________________
2
Difficulty Index = no. of correct responses (upper 27%) + no of correct responses (lower 27%)
Upper 27% + Lower 27%
g. From the given descriptions on the result of the difficulty and discrimination index, decide
whether the question be retained, revised or rejected.
course objectives
topics covered in class
amount of time spent on those topics
textbook chapter topics
emphasis and space provided in the text
Difficulty Index is a measure used to determine the difficulty level of the test item. It
helps the teacher to calculate the proportion of the students who answered the test item
accurately.
Discrimination Index, refers to how well an assessment differentiates between high and
low scorers. In other words, teacher should be able to expect that the high-performing students
would select the correct answer for each question more often than the low-performing students.
If this is true, then the assessment is said to have a positive discrimination index (between 0 and
1) -- indicating that students who received a high total score chose the correct answer for a
specific item more often than the students who had a lower overall score. If, however, the teacher
finds that more of the low-performing students got a specific item correct, then the item has a
negative discrimination index (between -1 and 0).
8. Discuss the steps in the U-L methods of item analysis.
4.78
A practical, simple and rapid method is to perforate on your answer sheet* the boxes
corresponding to the correct answer. By placing the perforated sheet on the student's answer
sheeta the raw score (number of correct answers) can be found almost automatically.
* This format corresponds to a multiple-choice question test. However, the “item analyses”
technique can also be used for other types of assessment instrument.
Figure
4.79
Assuming that the scores of 21 students have been obtained (alphabetical list on the left), this
step consists merely in ranking (listing) students in order of merit (in relation to the score)
proceeding from the highest to the lowest score. Let us assume the list as under A and then rank
the students to obtain distribution B, ranging from 4 to 27.
A
Student Score
Albert 7
Alfred 13
Andrew 19
Ann 25
Brian 16
Christine 19
Elise 17
Emily 24
Felicity 16
Frances 14
Frank 26
Fred 17
Harriet 11
Ian 17
John 14
Jennifer 21
Margaret 16
Michael 9
Patrick 27
Peter 4
Philip 16
B
Order Student Score
1 Patrick 27
2 Frank 26
3 Ann 25
4 Emily 24
5 Jennifer 21
6 Christine 19
6 Andrew 19
8 Elise 17
8 Ian 17
8 Fred 17
11 Brian 16
11 Felicity 16
11 Margaret 16
11 Philip 16
15 Frances 14
15 John 14
17 Alfred 13
18 Harriet 11
19 Michael 9
20 Albert 7
21 Peter 4
4.80
Ebel1 suggests the formation of “high” and “low” groups comprising only the first 27% (high
group) and the last 27% (low group) of all the students ranked in order of merit.
1
Ebel, R.L. (1965) Measuring educational achievement, Prentice Hall, pp. 348 - 349.
Why 27%? Because 27% gives the best compromise between two desirable but contradictory
aims:
Truman Kelley showed in 1939 that when each group consists of 27% of the total it can be said
with the highest degree of certainty that those in the high group are really superior (with respect
to the quality measured by the test) to those in the low group. If a figure of 10% were taken, the
difference between the two means of the competence of the two groups would be greater but the
groups would be much smaller and there would be less certainty regarding their mean level of
performance.
Similarly, if a figure of 50% was taken the two groups would be of maximum size but since the
basis of our ranking is not absolutely accurate, certain students in the high group would really
belong to the low group, and vice versa.
While the choice of 27% is the best, it is, however, not really preferable to 25% or 33%; and if it
is preferred to work with 1/4 or 1/3 rather than with the somewhat odd figure of 27% there is no
great disadvantage in so doing.
Difficulty index
Index for measuring the easiness or difficulty of a test question. It is the percentage (%) of
students who have correctly answered a test question; it would be more logical to call it the
easiness index. It can vary from 0 to 100%.
Calculation
Discrimination index
An indicator showing how significantly a question discriminates between “high” and “low”
students.
Calculation
4.81
If for a test you use a group of questions with indexes in the range 30% - 70%, then the mean
index will be around 50%. It has been shown that a test with a difficulty index in the range of
50% - 60% is very likely to be reliable as regards its internal consistency or homogeneity.
Discrimination index: the higher the index the more a question will distinguish (for a given
group of students) between “high” and “low” students. When a test is composed of questions
with high discrimination indexes, it ensures a ranking that clearly discriminates between the
students according to their level of performance, i.e., it gives no advantage to the low group over
the high group. In other words, it helps you to find out who are the best students.
It is most useful in preparing your question bank. Using the index 2, you can judge questions as
follows:
4.82
High reliability is one indicator that a measurement is valid. If a method is not reliable, it
probably isn’t valid.
Types of Reliability
Different types of reliability can be estimated through various statistical methods.
Test-retest
The consistency of a measure across time: do you get the same results when you repeat the
measurement?
A group of participants complete a questionnaire designed to measure personality traits. If they
repeat the questionnaire days, weeks or months apart and give the same answers, this indicates
high test-retest reliability.
Interrater
The consistency of a measure across raters or observers: do you get the same results when
different people conduct the same measurement?
Based on an assessment criteria checklist, five examiners submit substantially different results
for the same student project. This indicates that the assessment checklist has low inter-rater
reliability (for example, because the criteria are too subjective).
Internal Consistency
The consistency of the measurement itself: do you get the same results from different parts of a
test that are designed to measure the same thing?
You design a questionnaire to measure self-esteem. If you randomly split the results into two
halves, there should be a strong correlation between the two sets of results. If the two results are
very different, this indicates low internal consistency.
To established reliability, reliability should be considered throughout the data collection process.
When you use a tool or technique to collect data, it’s important that the results are precise, stable
and reproducible.
Plan your method carefully to make sure you carry out the same steps in the same way for each
measurement. This is especially important if multiple researchers are involved.
For example, if you are conducting interviews or observations, clearly define how specific
behaviors or responses will be counted, and make sure questions are phrased the same way each
time.
When you collect your data, keep the circumstances as consistent as possible to reduce the
influence of external factors that might create variation in the results. For example, in an
experimental setup, make sure all participants are given the same information and tested under
the same condition.
III. Find the difficulty index and the Discrimination Index of these test Items.
No. of pupils tested: 60
Remarks
Item Upper 27% Lower 27% Difficulty Discrimination *Rejected
Number Index Index ** Retained
*** Revised
1 13 6 0.59 0.44 **
2 16 10 0.81 0.38 **
3 15 7 0.69 0.5 **
4 14 5 0.59 0.56 **
5 10 10 0.63 0 *
6 12 6 0.56 0.38 **
7 9 12 0.66 -0.19 *
8 11 5 0.5 0.38 **
9 6 3 0.28 0.19 *
10 8 4 0.38 0.25 ***
11 5 2 0.22 0.19 *