You are on page 1of 36

Chapter: Introduction to Test, Measurement and Evaluation

A. Test
Test may be called as tool, a question, set of question, and an
examination which use to measure a particular characteristic of an
individual or a group of individuals.
It is something which provides information regarding individual’s
ability, knowledge, performance and achievement.
A test is a specific tool or procedure or a technique used to obtained
response from the students in order to gain information which
provides the basis to make judgment or evaluation regarding some
characteristics such as fitness, skill, knowledge and values.
B. Measurement

• Measurement is an act or process that involves the assignment of


numerical values to whatever is being tested. So it involves the quantity
of something.
• It is the collection of information in numeric form
• It is the record of performance or the information which is required to
make judgment.
• Measurement is therefore a process of assigning numerals to objects,
quantities or events in other to give quantitative meaning to such
qualities.
Cont…
• In the classroom, to determine a child’s performance, you need to obtain
quantitative measures on the individual scores of the child. If the child scores
80 in physical education theory test, there is no other interpretation you should
give it. You cannot say he has passed or failed.
• Measurement stops at ascribing the quantity but not making value judgement
on the child’s performance.
• If the test collects quantitative data, the score is a number.
• If the test collects qualitative data, the score may be a phrase or word such as
“excellent.”
NB:All the process such as testing, measuring, evaluating, identifying and
prescribing is called Assessment
C. Evaluation

• Once you have completed the measurement of a particular attribute of


an individual you must give meaning to it. So this interpreting and
giving meaning is known as Evaluation.
• It is a technique by which we come to know at what extent the
objectives are being achieved.
• It is a decision making process which assists to make grade and
ranking.
• It is the process of education that involves collection of data from the
products which can be used for comparison with preconceived criteria
to make judgment.
TYPES OF EVALAUTION

a. Formative Evaluation
• The purpose of formative evaluation is to find out whether after a learning
experience, students are able to do what they were previously unable to do.
Its ultimate goal is usually to help students perform well at the end of a
programme.
b. Summative evaluation
• Summative evaluation often attempts to determine the extent the broad
objectives of a programme have been achieved (i.e. SSSCE, (NECO or
WAEC), PROMOTION, GRADE TWO, NABTEB Exams and other public
examinations). It is concerned with purposes, progress and outcomes of the
teaching-learning process.
The main reasons for test, measurement and evaluation in sport
•  

• Motivation: if used correctly measurement can highly motivate most individuals.


• In anticipation of a test, students usually study the material or practice the physical
task to be measured.
• Therefore this study or practice should improve performance.
• Test, measurement then feedback
• Diagnosis: through measurement you can assess the weakness (need) and strength of a group or
individuals.
• Classification: to classify individuals in to similar groups (homogeneous).
• Achievement: determine the degree of achievement of program objectives and personal goals.
• Prediction: to predict future performance in sport.
• Research: used to find out meaningful solutions to a problem and as a means to expand body of
knowledge.
• To help teachers determine the effectiveness of their teaching techniques and learning materials;
1.2 TYPES OF TESTS
CHARACTERISTICS OF ESSAY TEST
• Generally essay tests contain more than one question in the test
• Essay tests are to be answered in writing only
• Essay test tests require completely long answers
• Essay tests are attempted on the basis of recalling the memory
Types of essay test
• Selective recall (basis given)
• Evaluation recall (basis given)
• Comparison of two things in general
• Summary of some unit of the text or of some article
• Analysis
• Illustrations or examples
• Discussions
• Criticism
Cont…
ADVANTAGES
• Can measure complex learning outcomes
• Emphasize integration and application of thinking and problem solving
• Can be easily constructed
• Examinee free to respond
• No guessing as in objective item
• Require less time for typing, duplicating or printing, can be written on board
• Can be used as device for measuring and improving language and expression skills
•  

LIMITATIONS
• Lack of consistency in judgements even among competent examiners
• Limited content validity
• Some examiners are too strict and some are too lenient
• Difficult to score objectively
• Time consuming
• Lengthy enumeration of memorized facts
Cont…
SUGGESTIONS FOR SCORING ESSAY TESTS
• Prepare scoring guide in the form of outline
• Particular question should be scored at one time of all the examinees
• To avoid hollo effect, identity of the examinee should not be communicated
to the examiner
• If possible appoint more than one examiners. The examiners should not know
who is the other examiner
• The correctness of the subject matter should not be mixed with the good
handwriting, better language, if they are to be given any weight, it should be
clearly indicated
TRUE/FALSE TESTS
• A true false item consists of a statement or proposition which the
examinee must judge and mark as either true or false
• ADVANTAGES
• It takes less time to construct true false items
• High degree of objectivity
• Teacher can examine students on more material
• LIMITATIONS
• High degree of guessing
• Largely limited to learning outcomes in the knowledge area
• They expose students to error which is psychologically undesirable
• They may encourage students to study and accept only oversimplified
statements of truth and factual learning
SUGGESTIONS
• Balance between true and false items
• Each statement should be unequivocally true or false. It should not be partly
true or partly false
• Double negatives should be avoided
• Long and complex statements should not be used as they measure reading
comprehension
• Only one idea should be measured in one statement
• Explain which judgement is to be used true/false, yes/no, correct/incorrect
• Clues should be avoided
• Statements should not be taken directly from the textbook
MATCHING TYPE TESTS
• A test consisting of a two column format, premises and responses that requires the
student to take a correspondence between the two 
ADVANTAGES
• Simple to construct and score
• Well suited to measure association
• Reduce the effect of guessing
• They can be used to evaluate examinee’s understanding of concepts, principle,
schemes for classifying objects, ideas or events
LIMITATIONS
• They generally provide clues
• They are restricted to factual information which encourages memorization
• If the same number of items are written in both the columns, the matching type is
converted to mcqs at late stage and in the end it is converted to true and false
category
SUGGESTIONS
• Homogeneous items should be selected
• No clue should be provided in both the columns
• Clear instruction to attempt
• All the items should be printed on the same page
• Premise should be written in the left hand column and be numbered, responses
should be written in the right hand column and be lettered
• Responses should be more than the premises to ensure that examinee has to
think even up to last premise
• Clear directions
• Incomplete sentences should not be used for premise
1.3 CHARACTERISTICS OF A GOOD TEST

• A test is not something that is done in a careless or haphazard manner.


There are some qualities that are observed and analyzed in a good test.
• A good test should be valid: by this we mean it should measure what
it is supposed to measure or be suitable for the purpose for which it is
intended.
• A good test should be reliable: reliability simply means measuring
what it purports to measure consistently. On a reliable test, you can be
confident that someone will get more or less the same score on
different occasions or when it is used by different people.
Cont…
• A good test must be capable of accurate measurement of the
academic ability of the learner: a good test should give a true picture
of the learner. It should point out clearly areas that are learnt and areas
not learnt. All being equal, a good test should isolate the good from the
bad. A good student should not fail a good test, while a poor student
passes with flying colours.
• A good test should combine both discrete point and integrative test
procedures for a fuller representation of teaching-learning points. The
test should focus on both discrete points of the subject area as well as
the integrative aspects. A good test should integrate all various
learners’ needs, range of teaching-learning situations, objective and
subjective items.
Cont…
• A good test must represent teaching-learning objectives and goals: the
test should be conscious of the objectives of learning and objectives of
testing. For example, if the objective of learning is to master a particular
skill and apply the skill, testing should be directed towards the mastery and
application of the skill.
• Test materials must be properly and systematically selected: the test
materials must be selected in such a way that they cover the syllabus,
teaching course outlines or the subject area. The materials should be of
mixed difficulty levels (not too easy or too difficult) which represent the
specific targeted learners’ needs that were identified at the beginning of the
course.
Cont…
• Variety is also a characteristic of a good test. This includes a variety
of test type: multiple choice tests, subjective tests and so on. It also
includes variety of tasks and so on. It also includes variety of tasks
within each test: writing, reading, speaking, listening, re-writing,
transcoding, solving, organizing and presenting extended information,
interpreting, black filling, matching, extracting points, distinguishing,
identifying, constructing, producing, designing, etc. In most cases,
both the tasks and the materials to be used in the tests should be real to
the life situation of what the learner is being trained for.
1.4 Criterion for selecting appropriate test
A. Criterion-referenced Measurement
• A pre-determined level of performance that shows a student has achieved a desired level of
performance.
• Compared against the standard, not other individuals.
• Can be used to determine grades.
• Example: Run 2 miles in 13 minutes or less to get A.
B. Norm-referenced Measurement
• Used to judge an individual’s performance in relation to the performances of other members
of a well-defined group.
• (Developed by testing large group—common norming method: percentiles, Z-score, T-score)
• It should be based on Sex, Weight, height, Age, Grade level 
Meaning of reliability, validity and
objectivity
Reliability
• Refers to the consistency of a test. By this, we mean, measuring what it purports to measure
consistently.
VALIDITY
• Validity of tests means that a test measures what it is supposed to measure or a test is
suitable for the purposes for which it is intended.
Objectivity
• A test has high objectivity when two or more persons can administer the same test to the
same group and obtain approximately the same result.
• Objectivity is a specific form of reliability and can be determined by the test- retest
correlational procedure.
• True- false, multiple- choice and matching tests have high objectivity than essay tests.
VALIDITY OF THE TEST

• Validity of tests means that a test measures what it is supposed to measure or a test is
suitable for the purposes for which it is intended.
• Refers to the degree to which a test actually measured what it claims to measure.
• The validity coefficient indicates how well a test measures what it claims to measure.
• The coefficient ranges from +1 to -1 and the closer to +1 is more valid test.
Validity of Norm-referenced Tests
There are five types of validity for norm-referenced tests. These are:
Face validity: a test has face validity or logical validity, when it obviously measures the desired
skills or ability.
• Example: if a person is asked to run 400m as fast as possible, it is clear that, the purpose of
this test is to measure his or her speed.
• Face validity is used to measure components of physical fitness.
Cont….
Content validity: is related to how well a test measures all skills and subject matters
that have been presented to the students.
• This validity suggests the degree to which a test adequately and sufficiently
measures the particular skills, subject components, items function or behavior it sets
out to measure.
• It must measures the objectives for which the students are held responsible.
Predictive validity: when you wish to estimate future performance. Predictive
validity suggests the degree to which a test accurately predicts future performance.
Concurrent validity (Immediate predictive validity)
• Indicates how well the individual currently perform a skill.
Construct validity: refers to the degree that the individual possesses a trait, presumed
to be reflected in the test performance.e.g Anxiety, Intelligence which are not seen by
human eye, Motivation
FACTORS AFFECTING VALIDITY

• Cultural beliefs
• Attitudes of testees
• Values – students often relax when much emphasis is not placed on
education
• Maturity – students perform poorly when given tasks above their
mental age.
• Atmosphere – Examinations must be taken under conducive
atmosphere
• Absenteeism – Absentee students often perform poorly
RELIABILITY OF THE TEST

• Refers to the consistency of a test. By this, we mean, measuring what it purports to


measure consistently.
• A test given to a group of students on one day should yield the same results if it is
given to the same group on another day. But same students may not be obtain the
same score on the second test, this is due to:
• Fatigue
• Motivation
• Environmental conditions
• Measurement error
• Reliability ranges from +1 to -1, and the closer the coefficient to +1 is the greater
reliability.
There are three methods of estimating reliability of norm-
referenced test. These are:

1.Test- retest method: to estimate reliability by means of the test- retest method, the same
test is administered twice to the same group of pupils with a given time interval b/n the two
administration of the tests.
• The resulting test scores are correlated using the Product –Moment correlation method
and the correlation coefficient provides a measure of stability over a given period of time
• One important factor here is, the time interval b/n the tests.
2.Parallel forms method (Equivalent)
• This method involves the use of two different but equivalent forms of test to the same
group of pupils in close section and the resulting test scores are correlated.
• The correlation coefficient provides a measure of equivalence.
Note:The primary problem associated with this method is difficulty of constructing two
tests that are parallel in content and item characteristics.
3.Split-half method

• In this method a test is split in two halves and the scores of the two halves are
correlated.
• A test is split into two equivalent sub tests using odd and even numbered items.
However, the equivalence of this is often difficult to establish.
• This method requires only one administration of test.
• A common practice is to correlate the odd-number item with the even-number items
using Spearman-Brown formula.
• The main factor here is the time give to complete the test.
Formula:
• Reliability of full test= 2x reliability on half test /1+ reliability on half test
• What will be the reliability of full test if the reliability of half test is 0.53?
Cont…
• Reliability of tests is often expressed in terms of correlation coefficients.
Correlation concerns the similarity between two persons, events or
things. Correlation coefficient is a statistics that helps to describe with
numbers, the degree of relationship between two sets or pairs of scores.
• Positive correlations are between 0.00 and + 1.00. While negative
correlations are between 0.00 and – 1.00. Correlation at or close to zero
shows no reliability; correlation between 0.00 and + 1.00, some
reliability; correlation at + 1.00 perfect reliability.
• Some of the procedures for computing correlation coefficient include:
 
Item Analysis

• Item analysis helps to decide whether a test is good or poor in two


ways:
• It gives information about the difficulty level of a question.
• It indicates how well each question shows the difference (discriminate)
between the bright and dull students. In essence, item analysis is used for
reviewing and refining a test.
Difficulty Level
• By difficulty level we mean the number of candidates that got a particular item
right in any given test. For example, if in a class of 45 students, 30 of the students
got a question correctly, then the difficulty level is 67% or 0.67. The proportion
usually ranges from 0 to 1 or 0 to 100%.
• An item with an index of 0 is too difficult hence everybody missed it while that of
1 is too easy as everybody got it right. Items with index of 0.5 are usually suitable
for inclusion in a test.
• Though the items with indices of 0 and 1 may not really contribute to an
achievement test, they are good for the teacher in determining how well the
students are doing in that particular area of the content being tested. Hence, such
items could be included. However, the mean difficult level of the whole test
should be 0.5 or 50%.
Cont…

You might also like