You are on page 1of 41

Meaning of Test, Measurement and Evaluation

Testing, measurement and evaluation are very important elements of educational process.
Like the two faces of a coin educational activities like course developments, students’
progress, teachers’ instruction, goals of an institution and testing, measurement,
evaluation go side by side. For making decisions in education we need data which is
achieved through tests. If decision is made without any data then such a decision may be
wrong. Data is gathered through testing.

Test
Definition 1:-
“Test is a measuring instrument”. Test is most commonly used method of making
measurements in education.

Definition 2:-
“Test is an instrument through which information needed for evaluation purposes is
obtained”. Or “Testing is a technique by which data needed for evaluation purposes is
obtained”.
Tests, quizzes, scales and other measuring instruments are used to obtain such
information.

Definition 3:
“Test is defined as an instrument or activity used to collect data on a person’s ability to
perform a specified”. It is also an instrument or systematic procedure for measuring
sample of behavior by posing a set of questions in a uniform manner. Tests are designed
to measure any quality, ability, skill or knowledge.

Classification of Tests
According to manner of response:
Oral or written
According to method of preparation:
Subjective / essay or objective
According to nature of answer
Intelligence test, personality test, aptitude test, prognostic test, diagnostic test,
achievement test, preference test, accomplishment test, scale test, speed test, power test,
placement test.
Tests may also be classified as:
1. Standardized Test
a) Psychological test:- Intelligence test, aptitude test, personality (rating scale )
test, vocational and professional interest inventory
b) Educational Test

2. Teacher-Made Test
It includes planning, preparing, reproducing, administering, scoring, evaluating
and interpreting by teacher according to classroom objectives.

MEASUREMENT- assignment of number


Definition 1:
A measurement takes place when a “Test is given and a score is obtained”.
For instructional purposes teacher has to measure student’s progress in knowledge (the
cognitive domain). Also affective and psychomotor domains are measured for all round
development of students’ personality. Here teacher is quantifying , how much does
learner learned. So the qualities of students are measured, that is we quantify the qualities
of students. Usually the qualities measured are academic achievement, intelligence,
motivation, adjustment, sociability etc. Also teachers’ quality of teaching has to be
measured. But it is very difficult to measure qualities of human being because no two
persons are alike in feelings, drivers or emotions.

Definition 2:
The systematic assignment of numerical values (quantitative) or verbal descriptors
(qualitative) to the characteristics of objects or individuals.
It means that data collected from measurement may be a score or a phrase or word. If the
test collects quantitative data, the score is a number. If the test collects qualitative data,
the score may be a phrase or word such as “excellent”.

Evaluation
Definition 1:-
Evaluation is defined as the process of examining a subject and rating it on the basis
of its important features. It is a process because it includes a series of steps
(establishing objective, classifying objective, designing objective, selecting indicators
and comparing data with objectives) .Example is assigning grade A, grade B, grade C on
the basis of marks is evaluation. Before buying a product and examining it being
expensive or cheap is evaluation.

Definition 2:-
Evaluation is systematic determination of merit, worth and significance of
something or someone using criteria of standards. It is concerned with making
judgments on the worth or value of a performance. For example if we want to buy a
house we may analyze its worth by criteria like its area, location , covered area, rooms,
construction. Comparing students’ performance on the standard of passing at 40%.

Definition 3:-
An evaluation takes place when results of measurement are summarized giving
meaning based on value judgments. It answer the question “ how good, adequate, or
desirable”. Judgment means a decision is made based on worth, utility, quality, value or
importance of data gathered about an object. So in evaluation data is analyzed on the
basis of some criteria.
For example if a person have Rs. 500, deciding whether money should be used on buying
a good book or a good movie is evaluation because criteria or standard while making
decision is used. It will be judged that a book or a movie has worth. We determine in
evaluation that how much or how little we value something. So evaluation is arriving at
our judgment on the basis of criteria which we have already selected.
Which students achieve
Definition 4:-
Evaluation is the gathering of information for the purpose of making decisions.
In educational context it is the process of obtaining, analyzing, and interpreting
information to determine the extent to which students achieve instructional objectives.

Steps involved in making an evaluation


1. Define the objective or the purpose of the test.
2. Measure the performance or administer the test.
3. Find or develop a standard
4. Compare a person’s performance on the test to a standard
5. Make the evaluation then discuss and distribute the results in the most appropriate
manner.

Relationship between Measurement and Evaluation

1. Measurement & evaluation are connected


Decision making is a connected and continuous process. It includes preparation,
testing, measurement, evaluation and reflection.
PREPARATION
PREPARATION

REFLECTION
REFLECTION TESTING
TESTING

EVALUATION
EVALUATION MEASUREMENT
MEASUREMENT

Tests are made to gather data (information) for a particular purpose. When tests
are given to individuals we get measurements about different traits of individual.
Evaluation is the analysis of measurement.

2. Evaluation is based on measurement.


“Measurement is the process of getting data for the purpose of evaluation”. For
evaluation purpose data is required and that data is provided through test. When
tests are given to persons, results are in the form of scores. Scores are
measurements of different qualities, traits, attributes of a person or a thing.
Psychologists and educationists are concerned with the attributes and qualities of
individuals. The process of measurement converts the variable into variate (a
variable with a numerical value) which is used for evaluation. For example,
intelligence is quantified in terms of I.Q and achievement variable is measured in
terms of scores.

3. Interdependent Concepts
Measurement: Collection of information on which a decision is based.
Evaluation: The use of measurement in making decisions.
As in clear from above definitions that measurement and evaluation are
interdependent concepts as evaluation is a process that uses measurements and the
purpose of measurement is to accurately collect information using tests for
evaluation. Improved measurement leads to accurate evaluation. Measurement is
systematic collection of concerned data about the object being evaluated.
Evaluation is the systematic collection and analysis of evidence in order to
improve understanding of and to make judgments about the object by the use of
measurement.

4. Qualitative and Quantitative aspect.


Gronlund and Linn (1990) view measurement as answering the question.
“How many”? and evaluation as answering the question “How much”. So
measurement is estimating “quantitative” while evaluation a “qualitative” aspect
of an object or person.

5. Constant and Variable


Measurement remains constant whoever measures but evaluation depends on the
mind frame of doer. For example taking 1 kg of rice is measurement but
determining as good quality or otherwise is evaluation.

6. Significant
Measurement is crucial for evaluation. Without measurement there is no positive
assurance that the judgments are accurate.
7. Objective and subjective
Measurement objective while evaluation may be objective or subjective.

Evaluation
Evaluation is a concept that provides information for decision making. It emerged as an
important process of testing and measurement. Evaluation is a systematic collection and
analysis of evidence (data) in order to improve understanding of and to make judgment
about the object or person being evaluated purpose of evaluation.
Purpose of Evaluation

1. Qualitative Improvement:- Its main objective is qualitative improvement. It is


used both in teaching & learning (instructional process) and in management of
organizational processes for improving current activities and future planning.
2. Helps in Planning and Management:- The decisions made in evaluation helps in
planning the future activities. For example teacher may plan to use appropriate
teaching method.
3. Report on Achievement:- Evaluation is the process of determining the extent to
which the objectives are achieved.
4. Motivation:- Evaluation has a motivational effect. For example if the teacher
continuously evaluates the students’ learning, the student tries to learn well.
5. Guidance can be given on the basis of evaluation.
6. Evaluation can help in the bringing changes.
7. Ensure optimum use of resources.
8. Present alternatives for decision makers to consider.
9. Placement in classes/ programs or grouping based on ability.
10. Diagnosis of weaknesses
11. Prediction of an individual. For example, as level of achievement in future
activities or predict one measure from another measure.

SCOPE OF EVALUATION

1. Educational Objectives:
An educational objective is often limited to what is mentioned in the prescribed
syllabus. An educational objective may better be defined “as a desired change in
behaviour of a person that we try to bring about through education”. One of the
purpose of evaluation is to determine the extent to which educational objectives
are achieved.
2. Evaluation of instruction (Teaching and Learning)
Teachers use evaluation to be informed of their teaching effectiveness and aid
them in improving their instruction. Three types of evaluations are used by
teacher.

a) Diagnostic Evaluation
b) Formative Evaluation
c) Summative Evolution

Diagnostic evaluation is done before instruction. The purpose of this evaluation is


to know with the entry behavior of the students i.e what they already know,
standard achieved and level of maturity. In accordance with necessary background
knowledge, learning experiences are provided. If necessary, remedial classes are
arranged.

During instruction teachers asks questions to have feedback about how students
are learning or to what extend they are learning. So learning experiences are also
evaluated by using formative evaluation. When the performance of most of the
students is not satisfactory on a particular concept / topic the learning experiences
should be planed again on the basis of available evidence. Summative evaluation
is used by teacher to know to what extend students have learned from lesson.
Summative evaluation is done at end of lesson.
3. The Evaluation of Teacher:-
Also effectiveness of teaching should be evaluated. Although teaching is a vital
part in system of education, the evaluation of teaching continues to receive less
attention. Teacher’s evaluation can be done by observations, classroom visits,
evaluation by peers, students.

4. Evaluation of a Course/ Program /Curriculum


It is evaluated that weather a course is according to needs, maturity, previous
knowledge base, and interest of students. Students’ and instructors can evaluate a
course. Outside evaluators can also provide insight into ways to strengthen or
improve a course. At least three specialists in the concerned field should evaluate
a course. Educational evaluation provides the direction for bringing about change
in curriculum.

5. Classroom Environment: Evaluation of a classroom involves whether a


classroom is comfortable, spacious, furnished, ventilated etc.

6. Prognosis and selection of right professions:


Capacity in all fields of human activities such as arts, music, clerical work,
teaching, medical etc is defined as aptitude. Aptitude tests are used for the
prediction of the capacity and success in a particular field. Such tests help in the
selection of right subjects and professions.

7. Guidance: Every child faces a variety of situations and problems in his


environment. A problem may be educational or vocational. Sometimes it may be
personal or it may be social. Evaluation makes the individual differences clear.
Through evaluation specific difficulties of the learner can be identified, so
evaluation helps to give guidance.

8. School Evaluation: Any organization and schools are established to serve some
goals of the community. For example it is goal of school to provide education to
children of a locality. Through evaluation information is received that whether
objectives of schools are achieved or not. Also the reasons of failure are also
identified.

9. Improvement of Public Relation: pupil evaluation may also be used as a basis,


through reports to parents for the improvement of public relation and the
mobilization of public opinion.
Principles of Evaluation

1. Purposeful
Evaluation should serve certain purpose. Evaluation may be done for many
purposes. For example, to improve performance and this purpose of evaluation is
sometimes called “formative” because the result, are used to improve the process
during its formative stages. In contrast, “summative” evaluation is used when the
purpose is to sum up or summarize that whether the desired results are achieved
or not. If achieved, to what extent. Diagnostic evaluation serves the purpose of
finding the present position, level and deficiencies of a process. Evaluation may
be also done to decide whether to continue a programme or to have some changes
in it. Also decisions about, expansion or education of a programme, change in a
policy, design or organizational structure are made by using evaluation

Purpose of evaluation should be clearly known to all related persons. It is not


necessary that all stakeholders (staff, administrators, funders) have same common
purpose. Although people may not completely agree about the purpose, it is
required to have clarity about purpose of evaluation through discussion with
major stakeholders.

2. Clarity
Objectives/ purposes and process of evaluation should be clearly stated in
advance. It should be cleared that what process, object or programme is to be
evaluated (what). Objectives/ purposes of evaluation should be decided in
advance (why). Data collected method, analysis procedure and criteria of
evaluation should be pre determined (how). It should be also cleared that when
evaluation will be done (when).
3. Comprehensive
Evaluation should be comprehensive one. This principle means that only one
aspect of an object, process or programme should not be evaluated but almost all
aspects should be evaluated. But practically it is not done. For example, it is well
known fact that evaluation in schools is limited to knowledge and understanding
level, neglecting the evaluation of skills and higher mental abilities.

4.Measurement
Evaluation should be always based on measurement. Measurement is crucial for
evaluation. If evaluation is done without measurement, it may be wrong.

5.Cooperation
Evaluation should be for the purpose of improvement and insight. In cooperative
evaluation the user acts as helper. This more relaxed approach has a number of
advantages. The user can speak and criticize the system. Also the evaluator can
clarify points of confusion.

6.Objectives
Evaluation should be objective i.e evaluators’ own feeling, opinion. Perception
should not affect the results of evaluation.

4. Continuous
Evaluation should be a continuous and integral part of a process.

5. Whole
Evaluation should be of the whole process and not of products.
10. Communicated
Evaluation should be communicated in advance to persons who are being
evaluated or involved in evaluation
11. Results
Results of evaluation should be utilized
UNIT2 TYPES OF EVALUATION
2.1 Placement Evaluation
It defines students' entry behaviors so it is also called "Entry point
Evaluation". Through placement evaluation a pupils' entry behavior is
measured. It is carried out in the beginning of teaching or course and tells
about, the readiness or fitness of an individual for the present course of
information.
Example 1
1. When a child is admitted in a school for the first time some basic
questions are asked related to imagination and knowledge from
environment. It is placement evaluation.

2. Suppose a student is interested in getting admission into a medical


college then different tests related to knowledge of F.Sc are taken but is also
checked to be medically fit. If found that eye sight is very week then they
are not capable to study the hard course of medical such tests are placement
evaluation.

A diagnostic evaluation is an important process necessary to identify


handicap, impaired sensory, manual or handicap individuals. It has the
purpose of identifying the presence or severity of learning disabilities
including autism. Autism is a disorder of neural development having
impaired social interaction and communication. A diagnostic evaluation is an
important process, however, it does not necessarily answers questions such
as what, where or how to most effectively educate a child.
It is often helpful to have an independent expert have a complete evaluation
of a child's unique needs in relation to both the childs' Individual Education
Plan (IEP) and the actual implementation of that IER. An evaluation
designed to answer these kinds of question is often referred to as an
Education Placement Evaluation or an independent program evaluation.

In some respect, an educational placement is similar to a diagnostic


evaluation. Like, both types of evaluations are done before starting a course
or activity.
Placement evaluation should be conducted by experienced
professionals working in the field of psychology ,behavior analysis, and
special education at the masters or doctoral level.

Placement evaluation requires a variety of techniques such as


readiness test, self inventories, observation techniques and so on.

Readiness test scores are used to judge whether children are 'ready'
for a particular level of education.

Self inventories often ask direct questions from persons about


symptoms, behaviors and personality traits about one of many mental
disorders or personality types to have insight into a patients personality or
illness. Most self-report inventories can be taken or admitted with five to
fifteen minutes, other may take up to three hours.
Observation technique is a technique that involves the direct
observation of phenomena in their natural setting.

Aptitude test: A test designed to determine a persons' ability in a


particular skill or field of knowledge.

Achievement test: A achievement test is used to measure acquired


learning in a specific subject area such as reading or arithmetic, in contrast
to an intelligence test, which measures potential ability or learning capacity.

Adaptive behavior includes the age appropriate behaviors necessary


for people to live independently and to work safely and appropriately in
daily life.

2.2 Diagnostic 2.3 evaluation 2.4 Summative


evaluation Formative evaluations

Diagnostic evaluation is Formative evaluation Summative evaluation


also known as pre- determine how much is typically quantitative,
students have learned
assessment and how much they still using numeric scores or
have to learn. letter grades to judge
learners achievement

Purpose

1.Generally completed
1.In practice the 1. To promote learning at the end of a unit of
purpose of diagnostic 2 .Formative evaluation work to document the
evaluation is to know provides ongoing level of achievement.
feedback to teachers
before teaching , each and students 2.selection
students' strength, Traditionally feedback 3.promotion
has been in the form of
weaknesses, level of comments written on 4.Certification
knowledge and skills students' work or verbal 5.Grading
comments given by
teacher. But not all 6. Reporting to parents
2. Prior knowledge feedback has to come and authority
from the teacher.
about students permits Students can gain very
the teacher to remediate useful feedback from
each other and through
students and adjust the reflecting on their own
curriculum to meet learning. By giving
feedback during
students' need teaching , teacher can
3.to identify problems ensure that feedback is
not just something that
and group students happens at the end of a
course but is an
ongoing part of the
learning process.

3.the evaluation
provides information on
progress
4. it identifies and
addresses areas that
requires further
development.
Nature
Many questions related
Many questions related to specifics and general
to general knowledge Few questions related to knowledge
specifics

Frequency of
administration
Once-- usually at
Once -- usually before conclusion of teaching
teaching Frequently-- before or
during teaching

Tools
This evaluation may be
1.Pre- test as informal as writing
2.Checklists 1. Teachers' questions one or two sentences
3.Formal evaluation 2. question and answer about the main points of
tools session a lecture or may be
3. classroom discussion more formal type of
4.Portfolios evaluation as
5.Observation 1.Written tests
6. Interviews 2.Oral presentations
7. homework 3.Project work
4.Quizzes
5.Problem solving
activities
6.Concept maps
7.Exams
8. assigning a grade to a
final exam
9.Essays

ACTIVITY: When the cook tastes the soup that's formative evaluation ;
when the guests taste the soup that's summative evaluation
Unit 3 Types of Tests
3.1 Essay Type Test
The essay test refers to any written test that requires an examinee to write a sentence, a
paragraph or longer passages and that demands a subjective judgment about its quality
and completeness when it is scored.
The distinctive features of the essay questions are:
1. No single answer can be considered correct
2. The examinee is permitted freedom of response
3. The answer vary
4. The answer needs a comprehensive explanation.

There are two main types of essay questions


a) Short response.
b) Extended response

a) Short Response
Short response questions are more focused and limited than extended, response
questions. These require students to give two or three facts or pieces of
information. For example: write an example, list three reasons or compare and
contract two techniques. These are designed to take about 4 to 5 minutes to
complete and the student is allowed up to 5 to 8 lines for each answer.

b) Extended Response
The extended response is an open-ended question in which students have to
provide detailed, much longer and complex answers.

Extended response essay are further divided as

i. Narrative Essay: Tells a story in a sequence of events. It can be either


fictional or of real life. There should be some point, lesson or idea concluded
from this narrative to make the essay meaningful e.g a thirsty crow.
ii. Descriptive essay: The descriptive essay asks the student to describe a
person, object, place emotion, situation, memory, experience, etc. It explains
the “what, why, how, when and where’s” of a topic e.g Quaid-e-Azam, a hot
summer day, a visit to a hill station, football match.

iii. Persuasive /Argument Essay: The student make a claim or take a position
and support it with statistics, expert opinion, examples and other evidence e.g
economic growth and environmental damage, use of organic and regular food,
negative effects of television / net.

iv. Comparison Essay: Demonstrates similarities and differences between two


topics e.g compare essay and objective type test.

v. Evaluation / Critical Essay: The purpose of an evaluation essay is to show


the overall quality, importance, value (or lack thereof) of something.
Evaluation should be based on clear and fair criteria, judgment and evidence.

vi. Cause /Effect Essay: Focuses on a condition or situation and asks either why
( cause) or what is the result (effect). For example effect of cigarette smoking,
causes of poverty.

Disadvantages

1. Poor Content validity: in essay type test less content can be sampled so it has poor
/ limited content sampling and therefore poor content validity.
2. less content reliability: Mostly students fail to understand the questions and so are
not sure how to respond. Also the criteria in the mind frame of examiner may be
different from the correct response of examinee. Also there are more than one
correct response. Therefore reliability of response is less.
3. Less score reliability. A subjective measurement is one that can possibly be
interpreted differently. An objective measurement is one that cannot be interpreted
differently. Essay type tests are scored subjectively so has less score reliability.
4. time consuming: while responding the essay questions time management for
students is very difficult because they have to provide a detailed, much longer and
complex answers. Also essay type test requires, much amount of time to read and
grade, as compared to objective type test.

Suggestions for improvement


1. Specific: they should specify how the students should respond. For example
analyze the educational system in Pakistan under these headings.
i) Educational objectives ii) Curriculum iii) methods of teaching iv) management

2. Value / weight : they should provide information about the value /weight of the
question and how it will be scored. For example what is curriculum in education
and explain the principles of curriculum development
3. Higher –level thinking: They should require higher-level thinking skills such as
analysis, synthesis and evaluation.
4. Scoring scheme.
5. Provide ample time for answering.

Objective type test


Objective type items are those whose responses or answers are restricted or say have limit
fix answers. Such item scoring is so complete and specific that they do not allow
examiner to make subjective inferences or judgments. Here correct answer is provided
and student has to choose it.

Types of objective type test


The following are examples of some types of objective type tests.

1. True / False Questions


True / false questions require a student to check whether a statement is true or not. A true-
false question is a specialized form of the multiple-choice format is which there are only
two possible alternatives. These questions can be used when the examiner wishes to
measure a student’s ability to identify whether statements of fact are accurate or not. For
example
a) Islamabad became the capital city when it was moved from Karachi in 1961. (True)
b) In 1971 East Pakistan transformed into Bangladesh (True)
However true-false questions do have a number of limitations / demerits.

a) Guessing: A student has a 1 in 2 chance of guessing the correct answer of a


question.
b) It can be difficult to write a statement which is unambiguously true or false-
particularly for complex material.
c) The format does not differentiate among students of different abilities.

2. Multiple Choice
Multiple choice is a form of test in which is examinee are asked to select the best
possible answer (or answers) out of choices from a list. Multiple choice testing is
particularly popular in the United States. Fredrick J. Kelly was the first to use multiple
choice tests.

Structure
Multiple choice items consist of stem and a set of possible options. The stem is the
beginning part of the item that presents the item as a problem to be solved, a question
asked of the respondent or an incomplete statement to be completed, as well as any other
relevant information usually there are 3 to 5 responses. The options / responses are the
possible answers that the examinee can choose from, with the correct answer called the
key and the incorrect answer called distracters. Only one answer can be keyed as correct.
This contrasts with multiple response items in which more than one answer may be keyed
as correct. Usually, a correct answer gets a set number of points towards the total marks,
and an incorrect answer gets nothing. However, tests may penalize students for incorrect
response by negative marking, for incorrect answers. This will discourage guessing. For
example the SAT removes a quarter points from the examinees score for an incorrect
answer. For example the IT capital of India is
A – Bangalore
B – Mumbai
C – Mysore
D – Mexico
The items of a multiple choice test are often called a “Questions” but many items are not
written as questions, for example, they can be presented as incomplete statements or
mathematical questions. Thus, the more general term “Item” is the most correct label.
Items are stored in an item bank. A well written multiple choice questions avoids
obviously wrong or silly distracters (such as Mexico) in the above example.

Advantages
 Very effective (can target factual knowledge, comprehension, analysis, synthesis
or evaluation)
 Versatile for all levels
 Minimum of writing for students
 Cost effective in terms of paper and time.
 Guessing reduced
 Can cover broad range of content.
 Ensure reliable efficient scoring
 Provide objectivity and validity

Disadvantages
 Difficult to construct good test items.
 Difficult to come up with probable distracters / alternative response.
 It takes more time and effort to prepare the test.
 Can’t test the skills like communication, psychomotor skills and interpersonal
skills.

3. Matching questions
Matching items allow students to pair items in one column to items in another
column. For example match the capital with the country.
Canada Tokyo
Italy Ottawa
Japan Rome
Students are required to match the options associated.

Strength / Merits
 Simple to construct
 Well suited to measure associations between facts.

Limitations / Demerits
 Difficult to measure learning objectives requiring more than simple recall of
information.
 Difficult to construct due to problem of selecting a common set of key words and
options.
 Options cannot be used more than once, the questions are not mutually exclusive,
therefore, getting one answer incorrect automatically means a second question is
incorrect.

4. Fill in the blanks


These are sentence completion items. It is a type of item having a statement with one
or more words replaced with a blank line, requiring the students to add the missing
word(s). While preparing fill in the blanks test take care that

 Only key words should be missed


 Sentences should not be directly from the text.
 There should not be too many blanks in the statement.

Unit # 4 Test Construction


4.1 Planning a test
Following basic decisions need to be made when planning a test
Why : Decide the Purpose
The purpose of the test must be decided, whether test is taken to make
1. Placement decisions: deciding where in the instructional sequence the learner should
begin to avoid repeating unnecessary, what the learner already knows. E xample is
entrance test
2. Diagnostic decisions: deciding the learning activities the learner should engage in to
increase the chances of learning the objectives that the teacher has set for the individual.
Tests for diagnostic evaluation need enough items needed in each specific area
3. Monitoring decisions: deciding whether the students are attending to the activities
4. Readiness : to make students mentally prepared for learning. Items will be
i. Limited in scope
ii. Low difficulty level
iii. Serve as basis of remedial work, adapting instruction
5 . Monitor students learning progress during instruction with the purpose of providing
ongoing feedback to students. Such evaluation is called Formative evaluation. Items will
i. Monitor learning progress
ii. Detect learning errors
iii. provide feedback for teacher and students
iv. Limited sample of learning outcomes
v. Must assure to mix difficult and easy items
e.g., review for whole group, practice exercises for a few)

1. End of instruction
 Mostly summative-broad coverage of objectives
 Can be formative too

2. Choose the formats of test items to be used


1. Objective-supply-type
a) Short answer
b) Completion

2. Objective-selection-type
a) True-false
b) Matching
c) Multiple choice

3. Essays
a) Extended response
b) Restricted response

4. Performance-based
a) Extended response
b) Restricted response

Objective Items
 Strengths
 Can have many items
 Highly structured
 Scoring quick, easy, accurate

 Limitations
 Cannot assess higher level skills (Problem formulation, organization,
creativity)

Essay/Performance Tasks
 Strengths
 Can have many items
 More realistic

 Limitations
 Inefficient for measuring knowledge
 Few items (poorer sampling)
 Time consuming
 Scoring difficult, unreliable
3.Deciding the number of questions
The decision of how many questions to include on a test is based on the importance of the
objectives, the type of questions, the subject matter and the amount of time available for
testing.

4.2 Preparing the test Item

1. Test content should match course objectives


2. Important topics should be weighted more heavily than less important topics
3. The testing time given to each topic should reflect the relative importance of the
topic
4. Use appropriate reading level (don’t be testing for ancillary skills)
5. Avoid " tricky" and overly complex items
6. Write so that item provides no clues to other items
7. Make sure objective type items can be answered without looking at the options
8. Make sure options are 100% true or false
9. Include as much of the item as possible in the stem; the stems should be long and
the options short
10. Experts should agree on the answer.
11. don't use more detailed, longer, or textbook language for correct answers
12. don't have answer in an identifiable pattern
13. Write more items than needed
14. Write well in advance of lasting date
15. And most important of all, focus on important concepts; don't waste time testing
trivial facts

5.3 Purpose of classroom testing


Classroom testing should be part of the teaching- learning process. It provide information
to teacher as well as to student .
1. It give information about whether a student is prepared to learn next topic.
2. Give information about whether a student has mastered a specific instructional
objective.
3. Whether a review of past learning or an integration of such learning is needed.
4. How a students' learning of a given topic might best be carried out-- what teaching
methods and teaching aids are most suitable

Unit # 6 Qualities of a Good Test


6.1 Reliability
When we say that a person or car is reliable, we mean that we can predict to some extent
what they are going to do. We know that person will arrive on time and we know the car
will start. Much the same is said of a measure in education. If a measure is reliable, we
know that it is a good measure and that it will do what it should do. The first meaning of
reliability is internal consistency that is extent to which all parts of a measure are
measuring the same thing. Here reliability is defined as the consistency with which a
given test reports the variable that is being measured, e.g variable may be pupil's
performance. The second meaning of reliability is stability over time that is extent to
which the measure is likely to change over time.
Generally longer tests are more reliable than short tests
Essay questions are less reliable than objective questions.

Definition
Reliability is defined as the extent to which it measures, what it measures, consistently
and accurately. A test is said to be reliable if it give the same results of same persons,
each time.
Following are methods of checking reliability:

1. Test Re-test: An obvious way to estimate the reliability of a test is to administer it


to the same group of individuals on two occasions and correlate the paired scored.
The correlation coefficient obtained by this procedure is called a test-retest
reliability coefficient. For example, a physical fitness test may be given to a class
during one week and the same test given again the following week. If the test has
good reliability, each individual’s relative position on the second administration of
the test will be near his or her relative position on the first administration of the
test.

2. Alternate-Forms: It is also called equivalent-forms or parallel-form reliability.


Here two independent forms of test are made such that both are equivalent-that is
they must have the same number of items, instructions, time limits, format,
content, range and level of difficulty-but the actual questions are not the same.
Ideally test items should be in the form of pairs of equivalent questions and
assigned one to say test I and another to test II.
Following example of equivalent question.
One what continent is the Nile River?
On what continent is the Amazon River?
What is the capital of Italy?
What is the capital of France?
Alternate (equivalent) form of the test is administered to the same individuals.
The two forms are administered at the same time or immediately one after
another. Scores of the two tests are correlated.

3. Split Half: It is the simplest of the internal consistency procedures. It artificially


split the test into two halves i.e one big test of say 100 items will split into two
tests each of 50 items. The next stage is to correlate the results of the two halves
of the test. Administer test to a group. Obtain the scores for each individual on the
two halves and correlate the results of the two halves of the test. If each individual
has a very similar position on the two halves the test has high reliability.
4. Kuder- Richardson Procedures: Reliability is also calculated by following two
formulas developed by Kuder and Richardson.
Kuder-Richardson formula 20
r= k(s2 – ∑pq)
k – 1 (s)
r = reliability of the whole test.
k = number of items on the test.
s = variance of scores on the total test.
p = proportion of correct responses on a single item.
q= Proportion of incorrect responses on the same item.
Kuder-Richardson formula 21
r= ks – x (k – x)
s (k – 1)
r = reliability of the whole test.
k = number of items in the test.
s = variance of scores
x = mean of scores
6.2 Validity
Validity is the most important criteria for the quality of a test. The term validity refers to
whether or not the test measures what it claims to measure. On a test with high validity
the items will be closely linked to the purpose of test. This means that if purpose of a test
is granting of licenses or certification then items must be highly related to a specific job
or occupation. If a test has poor validity then it does not measure the job-related content
and competencies. When this is the case, there is no usefulness of using the test results
for what it was meant. There are several ways to estimate the validity of a test including
content validity, construct validity, concurrent and predictive validity.

1. Content Validity: Content validity is the extent to which a test measures content
domain of which it was designed to be measured. Another way of saying this is that
content validity concerns, primarily, the adequacy with which the test items
adequately and representatively sample the content area to be measured. It is also
called curricular validity. In educational measurement we are mainly concerned with
curricular validity, i.e, the extent to which the content of the test truly represents the
content of the course. For example a comprehensive math achievement test would
lack content validity if good scores depend primarily on knowledge of English, or if it
only had questions about one aspect of math (e.g algebra). Test made for granting
licenses and certification will be content valid if it does measure job-related content
and competencies.

2. Criterion Validity: In criterion validity a well established measurement procedure-a


criterion is used to create a new measurement test, both measuring the same or similar
construct. There are two types of criterion validity.
3. Predictive Validity: Predictive validity is a measurement of how well a test predicts
future performance. For predictive validity the test scores are collected first, then at
some later time (weeks, if not months or years) the criterion measure is collected
Example
1. Predictive validity is used in tests made for selection purpose
2. Used in aptitude tests
Suppose we want to know the predictive validity of NTS tests, used nowadays for
entrance in any university program. For developing predictive validity, scores of NTS
will be collected first and then after months the scores of 1st Term result of same
students will be collected, both set of scores will be correlated
4. Concurrent Validity: Concurrent validity is a measurement of how well a test
correlates well with the measure that has been previously validated. The two measures
may be for the same construct or for different, but related constructs. The two measures
are taken at the same time. Concurrent validity is a measurement of how well a test
relates to a similar construct measure. The existing test is the criterion here.
Example: A measure of creativity should correlate with existing measures of
creativity.

4. Construct Validity: In educational research the variables used like motivation, self-
esteem, intelligence are abstract concepts that cannot be directly measured or
observed. Researchers select specific sets of observable tasks believed to be
indicators of the construct. Construct validity is the extent to which a test or measure
reflects the particular construct of interest.

5. Coefficient (Cronbach’s ) Alpha: Coefficient alpha, or sometimes called Cronbach’s


alpha, which was developed by Cronbach. Coefficient alpha is average of the
correlation of the possible ways of dividing the test into two sets.

6.3 Adequacy
This is another characteristic of a measure we are concerned with. Following points
must be considered for an adequate test.
Sampling: A test is a sample of one’s knowledge or skill. It is not possible to
measure one’s total knowledge or skill. Tests’ purpose is to select such a sample from
which to generalize. This points to the need of adequacy of sampling so that results
may be reliable. For example a teacher cannot include all syllabus in test but the
questions selected in test should represent the total syllabus.

Item difficulty: Perhaps "item difficulty " should have been named" item easiness";
it expresses the proportion or percentage of students who answered the item
correctly. Item difficulty can range from 0.0 ( none of the students answered the item
correctly) to 1.0 ( all the students answered the item correctly). Experts recommend
that the average level of difficulty for a four- option multiple choice test should be
between 60% and 80% ;an average level of difficulty within this range can be
obtained, of course, when the difficulty of individual items falls outside of this
range. If an item has a low difficulty value, say, less than .25, there are several
possible causes:
the item may have been miskeyed;
the item may be too challenging relative to the overall level of ability of the class;
the item may be ambiguous or not written clearly;
there may be more than one correct answers.
Further insight into the causes of a low difficulty value can often be gained by
examining the percentage of students who chose each response option. For example,
when a high percentage of students chose a single option other than the one that is
keyed as correct, it is advisable to check whether a mistake was made on the answer
key.
Theory: Most important the theory underlying the construct to be measured must be
considered.

Construct: Second the adequacy of the test in measuring the construct is evaluated.
For example, suppose that a researcher is interested in measuring the introverted
nature of B.Ed students. The researcher defines introverted as the overall lack of
social skills such as conversing, meeting and greeting people, and attending faculty
social functions. This definition is based upon the researcher’s own observations. A
panel of experts is then asked to evaluate this construct of introversion. The panel
cannot agree that the qualities pointed out by the researcher adequately define the
construct of introversion. Furthermore, the researcher cannot find evidence in the
research literature supporting the introversion construct as defined here. Using this
information, the validity of the construct itself can be questioned. In this case the
researcher must reformulate the previous definition of the construct.

Other Variable: Once a meaningful, useable construct has been developed the
adequacy of the test used to measure it must be evaluated. First, data from new test
be gathered and compared with data from other tests made for measuring similar
construct. This will give convergent validity of new test. For example mathematical
reasoning test would correlate with grades in mathematics or with other math
reasoning test.
After that discriminate validity of the test must be determined. Here data from the
new test is compared with data from other tests made for measuring different
construct. This gives discriminant validity of new test. For example the scores on the
math reasoning test would have little or no relationship with measures of other skills,
such as reading. Convergent validity tests that constructs that are expected to be
related are, in fact related. Discriminant validity (or divergent validity) tests that
constructs that should have no relationship do, in fact, not have any relationship.
Ease of construction and scoring
This quality focus on how easy it will be to devise the test tasks and to score them.
Some procedures such as interviews and individual observations of pupils
performances requires longer time to complete. Essay tests, term papers, projects and
written work require much teacher time to grade and evaluate.
6.4 Objectivity: Objectivity is the main quality of a test. Educational testing should be
as objective as possible. Of course, complete objectivity is never possible.
Test objectivity shows the relation between administering, scoring activities and
scores. The objectivity is the characteristic related to the personal error. A test
procedure is said to be objective if two or more observers of pupil's performance can
agree on the report of the performance.
A test is considered to be objective if the procedures for administration and scoring
are uniform i.e same. So objectivity means universal agreement in the administration
and scoring of a test.

Standardization: Standardized procedures for administering a test should be used.


Standardization refers to uniformity of the conditions and procedures for
administering a test. Every person taking the test should have same direction, test
items, time limit and conditions. B.I.S.E exams are an example of standardization.
Test can be made objective by adopting standardization procedures in administration.

Scoring: Objectivity primarily refers to the scoring of the test results. The scoring
process must be free of subjective judgment or bias. A test is objectively scored if
everyone scoring it should have same results. Subjective test or essay examination
will obviously have less objectivity.

Objective Test: These are the tests for which the scoring process is free of personal
judgment or bias. They contain multiple-choice and true-false items and scoring in a
mechanical process that requires no special training or knowledge.

Subjective Test: Tests that contain essay questions. Here the scoring process can be
influenced by the personal characteristics and attitudes of the scorer. A test can be
made more objective by adopting following steps while scoring.
Achievement: Scoring should only reflect student’s achievement. Other elements
like behavior, participation, attitude, attendance etc. should not influence the scoring.
Punishment & Reward: Scores should not be misused as punishment or reward.
There should be policy for misbehavior, poor attendance, tardiness etc and these
elements should not influence the scoring.

Cheating: Cheating is primarily a disciplinary problem and can directly influence


the achievement scores. Strict measures should be adopted to discourage cheating.

Variety of Methods: If possible variety of methods such as quizzes, assignments,


projects should be used to evaluate students. Using variety of methods give more
objective scores.

Uniformity: Before scoring all the scorer should discuss the criteria of marking so
that items be scored uniformly. Multiple choice items should be scored with the key
and essay type test should be scored by establishing a marking scheme.
6.5 Differentiability. A test should give as much information as possible about
difference between test takers on the characteristic being measured. People with high
levels of this characteristic should get high scores on the test. People with low levels
should also spread out widely across the range of possible scores, rather than to
group together at one or two points. When a measure has such standards,
psychomatricians says that it has high discriminating power that is it differentiates
well between people with high or low levels of the characteristics being measured.

If the test and an item measure the same ability or competence, we would expect that
those having a high overall test score would have a high probability of being able to
answer the item. We would also expect the opposite, which is to say that those
having low test scores would have a low probability of answering the item correctly.
Thus, a good item should discriminate between those who score high on the test and
those who score low.
Usually two ways of determining the discriminative power of an item are use: the
discrimination index and the discrimination coefficient. It has formula:

Di = GAcorrect answers – GBcorrect answers


Nlargest group
Where:
Di = Discrimination index of item i
GAcorrect answers = Number of correct answers to item i among the 27% of those with
highest test scores.
GBcorrect answers = Number of correct answers to item i among the 27% of those with
lowest test scores.
Nlargest group = Number of person in the largest group (GA or GB)

The higher the discrimination index, the better the item can determine the difference
between those with high test scores and those with low ones. If all the persons of GA
answer an item correctly, and all the persons of BG answer incorrectly, then D = .1 (the
minimum value of discrimination).
Unit # 8
8.1 Observation

1. Participant observation: In participant observation the researcher himself/herself


observe participants actively with in the group being observe.
Participants Observation is further divided into overt observation and covert
observation.

Covert Observation: Here the researcher is participating fully without informing


members of group the reasons for his/ her presence, thus the research is carried out
secretly or covertly. For example contact with a gatekeeper, a member of the group under
study who will introduce the researcher into the group.

Problem of Covert Observation


a) Discussion about the topic of research is difficult with members.
b) It may be unethical for researcher personally. Also it is against ethics of research.
c) There are problems of recording data.

Advantages
a) The researcher can access to group who would otherwise not agree to being
studied.
b) Avoiding problems of ‘observer effect’ the conception that individuals behavior
may change if they know they are being studied.

Overt Observation: The researcher being open about the reason for his/her presence in
the group to be observed. The researcher is given permission by the group to conduct
research.
Problems with overt observation:
Observer effect, where the behavior of those under study may change due to the presence
of the researcher.

Advantages
a) The group is aware of the researcher’s role so no problem of ethics.
b) The group is observed in its natural setting.
c) Data may also be openly recorded.

2. Partial Participation
Some but not all the participants know the observer.

3. Onlooker / Non-Participant Observation


Participants do not know that observations are being maid are that there is some
one observing them. False explanations are given about the purpose of
observation. Participants are deceived about the purpose of the observation.

Observation in Classroom
Teacher can use classroom observation as a tool to get feedback about teaching learning
process. Observant teacher can tell when students understand the content presented or
when they have difficulty grasping the content.
All people and thus certainly teachers and students use facial expressions to form
impression of another. A teacher can also use student’s facial expressions as valuable
sources of feedback when for example, delivering a lecture, a teacher should use
student’s expressions to determine whether or not to slow down, speed up, or in some
other way modify his /her presentation. Teacher often use eye contact in the classroom to
decide who is prepared to answer a question or who was completed a homework
assignment. Teacher also through observation may decide whether student is repenting or
is trying to deceive teacher. Teacher may decide whether by asking question student
wants to be dominant or impress teacher or in reality not understanding lesson.
Most experienced teachers are aware when students are bored with the subject matter
being presented. Student’s eyes often signal listening and non-listening behaviors thus
transmitting subtle messages about their attentiveness. Students who are constantly
looking at the wall clock, using mobile etc rather than watching and listening to the
teacher may be indicating the need for a break or the dullness of the content, or a lack of
teacher motivation and presentation. In any case, observation of eye behavior can be used
in evaluating teacher and student performance.

8.2 Interview
Interview is a technique of gathering data in qualitative research. In qualitative research
data takes of the form of words or pictures rather than numbers.

Definitions
1. An interview is a piece of social interaction with one or more persons, one person
asking a number of questions and other person giving answers.

2. An interview is a conversation between two or more people where questions are


asked by the interviewer to get facts or statements from the interviewee.

Preparation for Interview:


Homework: Researcher should do some homework on the people he/she is interviewing.
This will make researcher well informed about subject and also will help to make suitable
interview questions.

Rapport: It is necessary that before formal interview there should be one or few
meetings between researcher and subject (the person being interviewed). Purpose of these
meetings is to develop rapport. Also in these meetings format, schedule, format of
questions and other details of interview can be discussed.

Small Talk: Most interviews being with small talk. Purpose of this chit-chat is to develop
good relationship and environment.
Clear: Research should develop clear simple, easy and short questions which are spoken
understandably. Questions should be free of words, idioms or syntax that may affect
inference and understanding of question and create ambiguity.

Talk Freely: Good interviews are those in which the subjects are at ease and talk freely
about their points of view. Good interview produce detailed data filled with words that
explains the subjects’ point of view.

Interest and attention: Good interviewers communicates personal interest and attention
to subjects by being attentive, nodding their heads and using appropriate facial
expressions to communicate.

Clarification: The interviewer may ask for clarification when the respondent mentions
something that seems unfamiliar, using phrases such as “What do you mean?” “I am not
sure I am following you” “Could you explain that?”

Divert: Interview should not divert from main topic. Interviewer should try that
interview may move around main topic.

Probing question: Interviewer should try to avoid such questions that can be answered
by “Yes” or “No”. Detail will come from probing questions. Probing questions are such
questions which are answered by thinking more deeply about the issue. For example
“were you good student in school?” can be answered in one word if the subject so
chooses, but “tell me about what you were like as student in school” need description.

Silence: Interviewers need not fear silence. Silence can enable subject to get their
thoughts together or to give some direction to conversation.

Listen: Most important is the need to listen carefully. Every word of the subject has the
power to give information of his/her way of viewing the world. If at first, interviewer do
not understand what subject views are, ask for clarification. Question not to challenge,
but to make clear. If still researcher cannot understand, researcher should suppose that
problem is not with the subject’ wording but I myself am not able to understand.

Flexibility: Interviewing requires flexibility. Try different techniques, including jokes,


gentle challenges. Sometimes researcher might ask the subject to explain with stories and
sometimes researcher might share his / her experiences with the subject.

Understanding: Interviewer should not have the purpose that all the questions being
asked are answered. But the main goal of interview is understanding i.e to know the point
of view of the subject. Researcher should remain focused on the goal of getting
understanding and insight and should be prepared that he /she might not get answers of
all questions.

Gentle: Interviewer should be gentle, tolerant, sensitive and patient to the opinions of
subject.
8.5 Projective Techniques
In psychology projective test is a personality test. Here a person responds to ambiguous
stimuli, which are in the form of scenes, words or images. The responds are likely to
reflect the hidden emotions and conflicts.

How do projective test work


In many projective tests, the subject is shown an ambiguous image and then asked to give
the first response that comes to mind. The key to projective test is the ambiguity of the
stimuli. According to the theory behind such tests, clearly defined questions results in
answer that are carefully answered by person. Responses are developed keeping in mind
social boundaries. These answers are made by conscious mind. If participants are given
questions or stimulus that is not clear, the underlying and unconscious motivations or
attitudes can come in conscious mind.

Rorschach
Mostly used and popular projective test is the Rorschach Inkblot test, in which a person is
shown ten irregular but symmetrical Inkblots and asked to explain what they see.

Thematic Apperception Test


Another popular projective test is the Thematic Apperception Test (TAT) in which subject
views ambiguous scenes of people and is asked different questions about the scene, for
example, the subject may be asked what is going on in this scene, the emotions of the
character and what will happen afterward.

The subjects responses are then analyzed in various ways, noting what was said, time
taken to respond, which aspect of drawing was noticed and responses of different pictures
are also compared. For example if someone sees the image many time as frightening, it
means subject may have paranoia.

You might also like