You are on page 1of 121

Evaluation

Prepared by:
Usha Kiran Poudel
SBA, 2079
Example Question
For the evaluation of the students’
achievement, what should be considered
first?
a) Resource available
b) Content covered
c) Types of evaluation
d) Objective of the learner
Evaluation
• It is defined as a systematic process of
determining the extent to which the
instructional objectives have been achieved by
the students.
Example Question
Systematic process of assessing student’s
achievement is
a) Nursing process
b) Evaluation
c) Assessment
d) Problem solving
Example Question
• The main aim of formative evaluation is
to:
a) Inform the student on his/her progress
b) Preserve anonymity
c) Permit rank ordering of the student
d) Certify the student
Example Question
• Assessing student’s progress to provide
feedback is:
a) Formative evaluation
b) Summative evaluation
c) Placement evaluation
d) Diagnostic evaluation
Example Question
• The extent to which it measures what it is
intended to measure is
a) Validity
b) Reliability
c) Objectivity
d) Specificity
Principles of Evaluation
• Identify the instructional objectives in order
to determine what is evaluated.
• Use variety of evaluation techniques to get
more complete picture of student’s
achievements.
Cont…
• Keep in mind the limitation as well as
strength of each evaluation technique.
• Ensure that critical intended behaviours are
included as the representative sample
behaviours in the test content.
• Give clear direction.
• Select proper item difficulty.
Cont…
• Use the appropriate length of test.
• Avoid giving unintended clues.
• Avoid identifiable pattern of answers.
• Use the evaluation technique that is
practicable.
• Make proper use of evaluation.
Types of evaluation
• In terms of how the results are used
• In terms of interpretation of test results
In terms of how the results are used
• Placement evaluation
• Formative evaluation
• Diagnostic evaluation
• Summative evaluation
Placement Evaluation
• Learner’s entry capability of knowledge, skill
and attitude is assessed.
• For example entrance examination, pretest
Formative Evaluation
• Students’ learning progress is assessed during
the period of instruction with the idea of using
the test results to provide ongoing feedback
both the students and teachers regarding
success and failure of T/L process.
Diagnostic Evaluation
• It is done to find out students’ persistent
recurring learning difficulties which cannot be
find by formative evaluation.
• The aim of diagnostic evaluation is to find out
the cases of learning problems and plan to take
remedial action.
Summative Evaluation
• This type of evaluation is given at the end of
the course or unit of instructions to find out
which student, to what extent, has mastered the
intended learning outcomes to be able to do his
job well.
Cont…

• The goal of summative assessment is


to evaluate student learning at the end of an
instructional unit by comparing it against some
standard or benchmark.
Examples of summative assessments include:
• a midterm exam
• a final exam
Example Question
• The main aim of formative evaluation is
to:
a) Inform the student on his/her progress
b) Preserve anonymity
c) Permit rank ordering of the student
d) Certify the student
In terms of interpretation of test
results
• Normative referenced test
• Criterion referenced test
Cont…
• Norm-referenced tests (or NRTs) compare an
examinee’s performance to that of other
examinees. Standardized examinations such as
the SAT are norm-referenced tests.
• The goal is to rank the set of examinees so that
decisions about their opportunity for success
(e.g. college entrance) can be made.
Cont…
• Criterion-referenced tests (or CRTs) differ in
that each examinee’s performance is compared
to a pre-defined set of criteria or a standard.
• The goal with these tests is to determine
whether or not the candidate has the
demonstrated mastery of a certain skill or set
of skills.
• These results are usually “pass” or “fail” and
are used in making decisions about job entry,
certification, or licensure.
Example Question

• Which of the following is an example of


Criterion-referenced assessment ?
a) Driving test
b) Wechsler Intelligence Scale for Children
(WISC)
c) SAT
d) Graduate Record Examination (GRE)
Characteristics of evaluation tool
• Validity
• Reliability
• Objectivity
• Usability
Validity
• Validity is one of the most important
characteristic of a measuring instrument
• It is the degree to which an instrument
accurately measures what it is expected to
measure.
• One should be clear whether the instrument
selected is the right measure for the variable
and whether it include adequate items to
measure the variable.
Cont…

• For measuring degree of dehydration, the


amount of fluid intake may not be a valid
measure.
Types of validity
• Face validity
• Content validity
• Construct validity
• Criterion related validity
Face validity
• It indicates whether the instrument appears to be
logically appropriate, so it is also known as logical
validity.
• Judging the adequacy of the instrument by merely
looking at it is very much subjective and is
therefore, weak
• So this technique is rarely used alone in
determining validity of an instrument.
• It has only the advantages that it requires less
time.
Content validity
• The content validity refers the degree to which
the test instrument represent the sample of
subject matter content and the intended
learning outcome.
• How far it matches with course content.
• Content validity is achieved when the content
of the assessment matches the educational
objectives.
Construct validity
• It is the most complex and the highest level of
validation.
• It is concerned with validation of the construct
underlying the theory/research
• Construct validity is concerned with examining
how well the instrument is actually measuring
the construct it clams to be measuring.
• Construct validity as a labeling issue.
Criterion related validity
• Criterion related validity is concerned with
establishing relationship between the score of an
instrument with the scores of some other sound
criterion.
• Criterion validity is demonstrated by the ability of
the test to relate to external requirements, as in a
proficiency exam.

• The instrument is said to be valid if its scores


strongly correlate with the scores of some
criterion.
Types of criterion related validity
• Concurrent validity
• Predictive validity
Concurrent validity
• Concurrent validity refers to the ability of a
new instrument is closely relating to other
measures of known validity.
• It is established by comparing the new
instrument with existing, validated measure.
• A new instrument of measuring body
temperature (hand touch method) would have
concurrent validity if it is highly correlated
with standard measure (tympanic thermo scan)
is measuring body temperature.
Predictive validity
• It refers to ability of an instrument in
predicting the future performance of student.
• If the instrument’s prediction of future
behavior is found to be accurate the instrument
is said to have high predictive validity.
• For example an academic entrance test has
predictive validity if it can predict which
students will do well in the academic
programme based on the entrance examination
score.
Cont…
There are several different types of validity:
• Face validity: do the assessment items appear
to be appropriate?
• Content validity: does the assessment content
cover what you want to assess?
• Criterion-related validity: does the test
measure external requirement what you want
it to?
• Construct validity: are you measuring what
you think you're measuring? And how well?
Example Question
• During staff evaluation, if the result is
consistently negative, it is called:
a) Under supervision
b) Reliability
c) Regularity
d) Validity
Reliability
• If the instrument is reliable, repeating it will
yield the same result.
• Reliability refers to the consistency of the
measurement from one measure to another.
Example Question
• Reliability is determined by all of the
following except:
a) Test-retest method
b) Observation method
c) Split half method
d) Equivalent forms method
Technique of establishing reliability
• Test-retest (stability)
• Internal consistency (split half)
• equivalence
Stability (test retest)
• Stability of a measure is the extent to which
the same scores are obtained when the
instrument is administered to same people on
separate occasions known as test- retest
method.
• Differences in the two tests if small, the
instrument is considered stable or reliable.
Cont…
• The stability of the test can be measured by
administering the same assessment twice with
some separation of time. Similar results show
high reliability, whereas different results show
low reliability
Example Question
• The internal consistency of the question is
maintained by:
a) Equivalent forms method
b) Spearman-Brown formula
c) Test-retest method
d) Split-half method
Internal consistency (split half)
• It is also known as homogeneity
• An instrument is said to have internal
consistency if all its subparts measure the
same attribute.
• Split half technique is the one method of
assessing internal consistency, in this items of
a scale after administration are split into two
parts (usually odd and even) and score
separately.
Cont…
• If the two halves show high correlation, it is
assumed that items have high internal
consistency eg they measure the same
attribute.
• More accurate and more advanced method of
computing internal consistency estimate is
Cronbach’s alpha or coefficient alpha that
gives split half correlation for all possible ways
of dividing the scale into two parts and not just
odd and even items.
Equivalence reliability
• This method is used to identify the extent of
agreement between the measurements of two
instrument (alternate test or parallel test
reliability) or between two rater or observers
using the same instrument(inter-rater or
inter-observer reliability)
Inter- rater reliability
• It is measured by having two or more trained
observers to observe the same phenomenon
simultaneously using the same instrument.
• The obtained score are subjected to reliability
coefficient computation to find out how well
they agree.
• Higher the agreement, higher is the
equivalence.
Parallel test reliability
• In this test two forms of instruments are used,
both forms containing the same number of
items and same level of difficulty.
• One form of test is administered to a group of
student and second form of test is administered
shortly there after to the same student.
• If the correlation coefficient between the two
score is high, the instruments are said to have
good correlation.
Example Question
• The internal reliability of the test instrument
is measured by:
a) Test-retest method
b) Split half method
c) Equivalent forms method
d) Spearman-Brown formula
Example Question
• Stability of test is maintained by:
a) Test retest method of reliability
b) Split half method of reliability
c) Inter-rater reliability
d) Equivalence test
Objectivity
• It refers to the degree to which equally
competent evaluators obtain the same result, as
there is no chance for the subjective judgment
of the scorer to enter in scoring the test.
• It is the extent to which several independent
examiners agree on what constitutes an
acceptable level of performance.
Usability
• It is concerned with the practicability of the
test instrument.
• The test should be easy to score.
• The cost involved in the development and
administration of the test measurement affects
its usability.
Example Question
• The extent to which several competent
examiners agree on what constitutes an
acceptable level of performance
a) Validity
b) Reliability
c) Objectivity
d) Specificity
Example Question
While preparing an evaluation tool which of the
following is most important?
a) It should be standardized
b) It should be based on behavior to be
measured
c) It should be easy to set up
d) It should be based on content
Example Question
• Which of the following statement is true with
the essay-type of measurement tool?
a) It is the best means of measuring the
psychomotor skills
b) It is the best means of measuring the creative
thinking
c) It saves time in scoring the test answer
d) It brings the objectivity in the measurement tool
Types of Measurement
• Classroom test measure
• Clinical test measure
Classroom test measure
Subjective type:
• Allow student more freedom to respond to the
question.
• Learner can use their creativity in presenting
facts.
Types:
• Extended response or essay type
• Short response
Cont…
• Objective type: the objective types of tests are
highly structured and limit the students’
response to the question.
Types:
• Supply type which includes short answer
and completion type.
• Selection type which includes alternate
response or true false type, matching type
and multiple choice question.
Clinical test measures
• Observational technique using rating scale,
check list and anecdotal report
• Written reports
• Practical examination OSPE and OSCE
• Oral/viva voce
Planning the classroom test
• Determining the purpose of evaluation

• Specifying the content to be covered


• Building a table of specification
• Select appropriate test items
Developing test design
s. No Objective Marks Percentage

1 Knowledge 10 20

2 Understanding 20 40

3 Application 8 16

4 Analysis 5 10

5 Synthesis 5 10

6 Evaluation 2 4

Total 50 100
Weight age to content areas
s. No Sub unit Marks Percentage

1 I 15 30

2 II 10 20

3 III 10 20

4 IV 5 10

5 V 10 20

Total 50 100
Weightage to form of question
s. no. Form of questions No. of Marks Percentage
questions

1 Objective-type 25 25 50

2 Short answer type 5 15 30

3 Long essay type 1 10 20

Total 31 50 100
Weightage to difficulty level
s. no. Level of difficulty Marks Percentage

1 Easy 10 20

2 Average 30 60

3 Difficult 10 20

Total 50 100
Writing the test items
• Give clear direction

• Keep the blue print


• Construct the test items with matching
learning outcome

• Keep the vocabulary as simple as possible


• Be sure the question is clear
Cont…
• Be sure that each item deals with important
aspect of content area

• Be sure that each item is independent


• Avoid trick or catch question
• Get the test exercise examined by one or
more colleagues for establishing its content
validity
Constructing subjective type test
items
Essay type question
• This methods used for evaluating the cognitive
learning.
• In this type of test students are free to select
any factual information, organize the
responses.
• Essays have the potential to reveal students'
abilities to reason, create, analyze, synthesize,
and evaluate.
Constructing subjective type test
items
Essay type question
• This methods used for evaluating the cognitive
learning.
• In this type of test students are free to select
any factual information, organize the
responses.
• Essays have the potential to reveal students'
abilities to reason, create, analyze, synthesize,
and evaluate.
Essay type question
An essay question should meet the following
criteria:
1. Requires examinees to compose rather than
select their response.
2. Elicits student responses that must consist of
more than one sentence.
3. Allows different or original responses or pattern
of responses.
4. Requires subjective judgment by a competent
specialist to judge the accuracy and quality of
responses.
Advantages of essay questions

1. Assess higher-order or critical thinking skills.


2. Evaluate student thinking and reasoning.
3. Provide authentic experience.
4. it takes relatively less time to prepare the test
instrument.
L
Limitations of essay question

1. Assess a limited sample of the range of content.


2. Are difficult and time consuming to grade.
3. handwriting might influence the scoring of good
content.
4. scoring become time consuming.
Verbs used for essay questions:
• Compose, evaluate, defend, explain, develop,
justify
Example Question
• To increase the distracters in MCQ it
decreases the probability of:
a) guessing
b) interpretation
c) attention
d) motivation
Multiple choice question
• MCQ may be used for evaluating learning at the
recall, comprehension, application, and analysis
levels, making them adaptable for a wide range of
content and learning outcomes.
There are three parts in multiple-choice item,
each with its own set of principles for
development:
– Stem
– Answer and
– distracters
• NCLEX exam is the example of MCQ..
Cont…

• The Multiple choice questions are very popular


in evaluation of undergraduate medical
students.
• They are reliable and valid; moreover they are
easy to administer to a large number of
students.
• Well constructed MCQs have a greater ability
to test knowledge and factual recall but they
are less powerful in assessing the problem
solving skills of the students.
Cont…
• A large proportion of curriculum can be tested
in a single sitting.
• The scoring is very easy and reliable using
computer software, but the construction of
good MCQs is difficult and needs expertise.
• Generally MCQs stimulate students to make a
superficial and exam oriented study.
Example question
• The tool used to assess the competencies
that are critical for gaining the proficiency
in the field is:
a) Rating scale
b) Checklist
c) Anecdotal report
d) Graphic chart
Example question

• Tendency to rate the people more on lower


side than what is deserved is:
a) Central tendency
b) Halo effect
c) Horn effect
d) Leniency effect
Example Question
The evaluator’s natural tendency to rate the
student at the high end of the scale is
referred as:
a) Halo effect
b) Leniency effect
c) Logical error
d) Horn effect
Common errors in rating
• Personal error(Bias)
– Leniency effect
– Horn effect
– Central tendency effect
• Halo effect
• Logical error
Personal Error
• Personal bias results from evaluator’s natural
tendency to rate all the students at
approximately at the same position on the
scale.
Leniency effect
• Evaluator’s natural tendency to rate the
students at the high end of the scale only.

• In other word it is called generosity error.


Horns effect
• The evaluator favor the lower end of rating
scale continuum.

• In other word it is called severity error.


• Some evaluators are hypercritical and
perfectionists so as to become tight fisted to
rate the students lower than they should.
Central tendency error
• Evaluator rate everyone at average within a
very narrow range concentrating the
measurement around the center is called
central tendency error.
Halo effect
• There is tendency of an evaluator to rate the
student’s performance as a whole on the basis
of the good impression about his/her previous
one or two performances.
Logical error
• It results when the evaluator rates the student
high on one characteristic because he/she
scored high on the other character that is
related to the one, which is being measured.
Example question
Recording some observed meaningful events
or incidence for evaluation is:
a) Anecdotal record
b) Checklist
c) Care studies
d) Rating scale
Anecdotal report
• It is a factual descriptions of the meaningful
incidents and event, which the clinical
supervisor observes in a student and records it
on a plain paper. Records both positive as well
as negative incidents.
Cont …

• Record the incident as soon as possible after


the observation was made.
• Do not use value laden word such as good and
bad, indicative of subjective judgments, should
be avoided.
Objective Structured Practical
Examination(OSPE)
• A well organized OSPE would test the
student’s competence in communication skills,
decision making skills, psychomotor skills and
knowledge competency simultaneously in one
setting.
• Objective Structured Practical Examination
(OSPE) is a new pattern of practical
examination.
• In OSPE each component of clinical
competence is tested uniformly and objectively
for all the students who are taking up a
practical examination at a given place.
• Through OSPE one gets a reasonable idea of
the extent of achievement of each student in
every practical skill related to a particular
discipline.
• It can be used for formative and summative
evaluation.
Example of OSPE question
Objective Structured Clinical
Examination(OSCE)
• It is a method of assessing student’s clinical
competency objectively.

• The whole clinical examination is divided into


subparts called station.
Difference between OSPE and OSCE
OSPE OSCE

Domain Higher level of Psychomotor


knowledge
Test ability to Identify structures Apply structural
on radiograph and knowledge to
relate it with clinical perform
scenario examination and
procedures
Examiner Nonspecific 2-3 Expert examiner for
personal per hall each station
Check list Not required Required
Standardized Not required Required
patient
Thank You

You might also like