You are on page 1of 18

DEPARTMENTAL SEMINAR ON

CRITERION AND NORM


REFERENCED EVALUATION

CHAIRPERSON

Dr. SAILAXMI GANDHI PRESENTER

Assistant Professor Mr. JINTO PHILIP

Department of Nursing I year MSc Nursing

NIMHANS NIMHANS

VENUE: ARTS THEATRE

DATE: 31/08/2010

TIME: 3:00 PM
INDEX
SI. No. CONTENT PAGE No.
1. INTRODUCTION 3
2. EDUCATIONAL EVALUATION 3
2.1. EDUCATION 3
2.2. EVALUATION 3
2.3. EDUCATIONAL EVALUATION 3
3. TYPES OF EVALUATION 4
4. INTERPRETING TEST SCORES 4
4.1. CRITERION-REFERENCED TESTS (CRTs) 4
4.1.1. DEFINITION 4
4.1.2. SETTING PASS/FAIL SCORE FOR CRTs 4-5
4.1.3. RELATIONSHIP OF CRTs WITH MASTERY AND DEVELOPMENTAL 6
LEVEL LEARNING
4.1.3.1. MASTERY LEVEL LEARNING 7
4.1.4.1. DEVELOPEMENTAL LEVEL LEARNING 7
4.1.5. STEPS IN CONSTRUCTING CRTs 7-9
4.1.6. CHARACTERISTICS OF CRITERION-REFERENCED TEST 9
4.1.7. USES OF CRITERION-REFERENCED TEST 10
4.1.8. LIMITATIONS OF CRITERION-REFERENCED TEST 10-11
4.1.9 ADMINISTRATION 11
4.1.10. TEST LAYOUT 11
4.1.11. RELIABILITY 11
4.1.12. CUT-OFFF SCORES 11
4.1.13 VALIDITY 11
4.2. NORM-REFERENCED TESTS (NRTs) 11
4.2.1. DEFINITION 11-12
4.2.2. SETTING PASS/FAIL SCORE FOR NRTs 12-13
4.2.3. STEPS IN CONSTRUCTING NRTs 13-14
4.2.4. CHARACTERISTICS OF NORM-REFERENCED TESTS 14
4.2.5. USES OF NORM-REFERENCED TEST 14
4.2.6. LIMITATIONS OF NORM-REFERENCED TEST 14
4.2.7. RELIABILITY 15
4.2.8. VALIDITY 15
5. CRITERION REFERENCED TEST Vs NORM-REFERENCED TEST 15
5.1. SIMILARITIES 15
5.2. DIFFERENCES 15-16
6. RESEARCH ABSTRACT 16-17
7. CONCLUSION 17
8. REFERENCE 17-18

Criterion and Norm Referenced Evaluation Page 2


1. INTRODUCTION
Education aims at an all round development of a student and not merely imparting knowledge to
him. It is therefore, necessary that teachers and educators should be equipped not only with subject
matters and dynamic methods of teaching but also with objectives and appropriate testing devices which
will assess a student’s ability. Sound student evaluation is an essential ingredient to strong educational
programs. Evaluation is probably the most common and pervasive aspect of student instruction. It is the
primary tool for guiding student development, crossing all academic disciplines. Certainly, evaluations of
students occur in all classrooms and regularly confront students and educators in a wide variety of
decision situations that affect their educational development. These decisions include matriculation,
admissions, grading, tracking, and instructional decisions for individual students, discipline, and merit
awards.10

2. EDUCATIONAL EVALUATION
2.1. Education

Education in the largest sense is any act or experience that has a formative effect on the mind, character
or physical ability of an individual. In its technical sense, education is the process by which society
deliberately transmits its accumulated knowledge, skills and values from one generation to another.6

According to Pestalozzi (1819), “Education is the natural, harmonious and progressive development of
man’s innate powers”.10

As per John Dewey (1916), “education is the development of all those capacities in the individual which
will enable him to control his environment and fulfill his responsibilities”.10

According to Mahatma Gandhi (1927),”Education is the all-round drawing out of the best in child, man –
body, mind and spirit”.10

2.2. Evaluation

The term evaluation is derived from the word ‘valoir’, which means to be worthy. Thus evaluation is the
process of judging the value or worth of an individual’s achievements or characteristics. Evaluation is
systematic determination of merit, worth, and significance of something or someone using criteria against
a set of standards.18

“Evaluation is the determination of the worth of a thing. It includes obtaining information for use in
judging the worth of a program, product, procedure or objective or the potential utility of alternative
approaches to attain specific objectives” (Worthen & Sanders, 1974)5

2.3. Educational Evaluation

Educational evaluation is concerned with judging the value or worth of the goals attained by the
education system.

“Evaluation is the process of determining to what extent the educational objectives are being
realized” (Ralph Tyler, 1950)10

“Evaluation is the systemic for determining the degree to which change in behavior of student
actually taking place” (Tyler, 1951)6

3. Types of evaluation

Criterion and Norm Referenced Evaluation Page 3


We can classify evaluation in the following ways

 Based on frequency of conducting evaluation


a) Formative evaluation
b) Summative evaluation
 Based on nature of measurement
a) Maximum performance evaluation
b) Typical performance evaluation
 Based on method of interpreting results
a) Criterion-referenced evaluation
b) Norm-referenced evaluation 19

4. INTERPRETING TEST SCORES

A raw test score is meaningless without a framework for interpretation. When people take an
assessment, it’s important for them to understand the implications of their scores, particularly when
passing or failing make a major difference in their lives. There are two ways to score an assessment.
These are referred to as criterion referenced and norm referenced. The raw test score is only given
meaning within the instructional content domain it represents. Criterion-referenced tests assess an
individual’s performance based on the percentage of content mastered. Norm-referenced tests define an
individual’s performance by comparing to others. Although both types of interpretation can be applied to
the same test, the interpretation is more meaningful when the test is specifically designed for a desired
interpretation (Linn & Gronlund, 2000).5

4.1. CRITERION REFERENCED TESTS (CRTs)

The concept of criterion-referenced measurement was introduced by Glaser (1963) and Popham &
Husek (1969) to highlight the need for tests that can describe the position of a learner on a performance
continuum, rather than the learner’s rank within a group of learners. The word criterion in the CRT refers
to domain of behaviors and in criterion referencing one is interested in referencing one examinee’s test
performance to a well defined domain of behavior measuring an objective or skill.5

4.1.1. Definition

“Criterion-referenced tests are constructed to provide information about the level of an examinee’s
performance in relation to clearly defined domain of content and/or behaviors”.

- Popham, 1978.5

“Criterion-referenced assessment refers to testing against an absolute standard such as an against a


benchmark individual's performance”. -  Wojtczak, 2002.18

Criterion-referenced test interpret a student’s raw score using a preset standard established by the
faculty. Thus each student’s competency in relation to the preset standard is measured without reference
to any other student. Student scores are then reported as percentage correct with each student’s
performance level determined by the preset, or absolute, standard. An example of a criterion referenced
objective is given below.5

The student demonstrated mastery by correctly


identifying 90 percent of the terms

Criterion and Norm Referenced Evaluation Page 4


With a criterion referenced score interpretation, the test designers have established an acceptable
standard for setting the pass or fail score. If someone passes this test, they are determined to be
qualified, whether it’s as a surgeon or a nurse — whatever job competencies the test measures.

Typical mastery curve for a criterion referenced test 4

This curve shows the number of people who took the assessment and the scores they achieved. The
bottom scale goes from test scores of zero up to 100 while the left hand side of the scale shows the
number of people who achieved a particular score. The cut score has been determined to be around 70
percent, which was probably set by subject matter experts who had determined the competency required
to pass the exam.

With a criterion referenced score interpretation more or fewer people will qualify from examination event
to examination event, since each sitting will include candidates with more or less knowledge. What’s
important, however, is that a benchmark has been established for the standards required for a particular
job. As an example, a driving test will use a criterion referenced score interpretation as a certain level of
knowledge and skill has been determined to be acceptable for passing a driving test. 2,3,4

Normally, performance standards are set on the test score scale so that examinee’s scores on a CRT
can be scored or classified in to performance categories, such as “failing”, “basic”, “proficient”, and
“advanced”. Today extensive use is made of CRTs and students’ scores from these tests are often
scored in to three or five performance levels or categories on state proficiency tests. With still other
CRTs, test scores are combined in to a single “pass” or “fail” score. For example the criterion may be”
students should be able to correctly add two single-digit numbers”, and the cut-off score may be that
students should correctly answer a minimum of 80% of the questions to pass. 5

4.1.2. Setting pass/fail score for criterion-referenced tests

One method is to compare students who are assumed not to know much about a subject, to subject
matter experts (SMEs) who do.

Criterion and Norm Referenced Evaluation Page 5


Setting pass/fail score for criterion-referenced tests
Source: Journal of University Teaching & Learning Practice.2009; Volume 6, Issue 2, Article 6

These SMEs perform well on the job, and if we look at the overall results a lower number of students
performed well compared to the scores that experts achieved. For students the bell curve is toward the
lower end of the score scale, while for subject matter experts the scores move to the higher end. One
technique for setting the pass/fail is to choose the score where these two curves intersect. In the example
above that would be at about 80%. The score at the intersection minimizes the number of test takers in
these groups that were misclassified by the test, i.e., minimizes the number of non-experts who passed
and the number of SMEs who failed. 2,3,4

If the pass/fail decision is determined only by a single score, then a single set of results can be used as
evidence. However, if multiple cut score levels are used for the various topics in the test then multiple
sets of scores must be examined.

Because CRTs measure a student’s attainment of a set of learning outcomes, no attempt should be
made to eliminate easy items. Today CRTs are called by many names “domain-referenced tests”,
“competency tests”, “basic skill tests”, “mastery tests”, “performance assessments”, “authentic tests”,
“proficiency tests”, “standard based tests”, “licensure exams” and “certification exams”. CRT today
typically use a wide array of item types including multiple-choice test items, true-false test items to essays
to complex performance tasks and simulations on a continuing basis.

CRTs are often teacher-made and are closely tied to the objectives and curriculum. They are most
meaningful when they are specifically designed to measure student ability in a particular area (Gronlund,
1973). Competency is such a critical requirement in nursing education that CRT is often the preferred
form of classroom testing in nursing education (Reilly & Oermann, 1990). With criterion-referenced
clinical evaluation, student performance is compared against preset criteria. In some nursing courses
these criteria are the clinical objectives to be met in the course.5

4.1.3. Relationship of CRTs with mastery and developmental level learning

Gronlund (1973) describes the relationship of criterion-referenced testing to the two levels of learning:
Mastery and Developmental.

4.1.3.1. Mastery level learning

Criterion and Norm Referenced Evaluation Page 6


At this level CRTs are concerned with the minimum essential skills that indicate mastery of an
objective. The scope of learning task is limited, which simplified the process of assessment. A score
of the percentage correct is usually used to identify how closely a student’s score demonstrates
complete mastery of an objective. One challenge for the faculty is to identify a) which specific learning
outcomes the students are expected to master, and b) which objectives represent learning beyond
the mastery level, or developmental learning (Gronlund, 1973).5,6

4.1.3.2. Developmental level learning

The concept of developmental learning applies to constructs that represent complex, higher-order
thinking, such as critical thinking. The abilities associated with these levels are continuously
developing throughout life. Developmental level objectives represent goals to work toward, with
emphasis focused on continuous development, rather than a complete mastery of a set of
predetermined skills (Gronlund, 1973).

Learning outcomes at the developmental level represent degrees of progress towards an objective.
Because it is impossible to identify all the behaviors that represent a complex construct, only a
sample of the behaviors associated with instructional objectives at this level can be identified as
learning outcomes, this behaviors should define the construct and provide a representational sample
of student performance that will be accepted as evidence of the appropriate progress towards the
attainment of the ultimate objective.

Students are not expected to fully achieve objective at the developmental level. However they are
required to demonstrate the behaviors represented by the learning outcomes, and they are also
encouraged to strive for their personal level of maximum achievement towards the ultimate objective-
the personal best. At this level instructional objectives can be designed to show the development of
students as they progress through an instructional program. For example, the same general
objectives can be used in every course in a nursing programme, with the learning outcomes
becoming more complex as the student progress through the program.

Gronlund (1973) asserts that the use of criterion-referenced tests is restricted with assessment of
developmental learning. While test preparations should follow mastery level procedures, he suggests
that in order to adequately describe student performance beyond minimal essentials, tests at the
developmental level should include items of varying difficulty and allow for both criterion and norm-
referenced interpretations. 5,6

4.1.4. Steps in constructing CRTs

When CRT was introduced by Glaser (1963) and Popham & Husek (1969), the goal was to assess
the examinee’s performance in relation to a set of behavioral objectives. Over the years, it became
clear that behavioral objectives did not have the specificity needed to guide instruction or to serve as
targets for test development and test score interpretation (Popham, 1978). Numerous attempts were
made to increase the clarity of behavioral objectives, including the development of detailed domain
specifications that included a clearly written objective, a sample test item or two, detailed
specifications for appropriate content, and details on the construction of relevant assessment
materials (Hamilton, 1998). More recently in CRT practices has been to write objectives focused on
the more important educational outcomes (fewer instructional and assessment targets seem to be
preferable) and then to offer a couple of sample assessments- preferably samples showing the
diversity of approaches that might be used for assessment (Popham,2000).

The steps in constructing CRT are as follows

Criterion and Norm Referenced Evaluation Page 7


1. Preliminary considerations
a. Specify test purposes, and describe domain of content and/or behaviors that are of
interest.
b. Specify group of examinees to be measured, and any special testing requirements
resulting from examinee’s age, race, gender, socio-economic status, linguistic
differences, and disabilities and so on.
c. Determine the time and financial resources available (or specify them if not given)
for constructing and validating the test.
d. Identify and select qualified staff members (note any individual strengths and their
role in test development)
e. Specify an initial estimate of test length (include number of test items and/or tasks,
as well as time requirements for their development), and set a schedule for
completing steps in the test development and validation process.
2. Review of content domain/behaviors of interest (or prepare if not available)
a. Review the descriptions of the content standard or objectives to determine their
acceptability for inclusion in the test.
b. Select final group of objectives (i.e. finalize the content standards) to included in
the test.
c. Prepare item specification for each objective (or something equivalent, to lay out
the content clearly) and review them for completeness, accuracy, clarity and
practicability.
3. Item/ task writing and preparation of any scoring rubrics that are needed.
a. Draft a sufficient number of items and/or tasks for field testing.
b. Carry out item/task editing, and review scoring rubric.
4. Assessment of content validity.
a. Identify a pool of judges and measurement specialists.
b. Review the test items and tasks to determine their match to the objectives, their
representativeness, and freedom from stereotyping and potential bias (items/tasks
show potential bias when aspects of the assessment place one group at a
disadvantage-perhaps because of choice of language or situation).
c. Review the test items and/or tasks to determine their technical adequacy (does the
assessment material measure the content standards of interest?).
5. Revision of test item/tasks.
a. Based upon data from steps 4b and 4c; revise test items/tasks (when possible and
necessary) or delete them.
b. Write additional test items/tasks (if needed), and repeat step 4.
6. Field test administration (sometimes carried out within the context of an ongoing test
administration).
a. Organize the test items or tasks in to forms for field testing.
b. Administer the test forms to appropriately chosen groups of examinees,(i.e.,
groups like those for whom the final test is intended)
c. Conduct item analysis and item bias studies (usually called “studies to identify
differentially functioning test items”).
d. If statistical thinking or equating of forms is needed this step might be done here.
7. Revisions to test items or tasks.
a. Revise test items or tasks when necessary or delete them, using the results from
step 6c. Also check scoring rubrics for any performance tasks being field tested.

8. Test assembly.

Criterion and Norm Referenced Evaluation Page 8


a. Determine the test length, the number of forms needed, and the number of
items/tasks per objective.
b. Select test items/tasks from available pool of valid test material.
c. Prepare test directions, practice questions (when necessary), test booklet layout,
scoring keys, answer sheets and so on.
d. Specify modifications to instructions, medium of presentation or examinees
responses, and time requirements that may be necessary for examinees with
special needs.
e. Include anchor test items if the test is being statistically linked to a previous test or
tests.
9. Selection of performance standards.
a. Determine whether performance standards are needed to accomplish the test
purpose (usually they are).
b. Initiate (and document) a process to determine the performance standards for
separating examinees in to performance categories. Compile procedural, internal
and external validity evidence to support the performance standards, (Cizek,2001)
c. Specify considerations that may affect the performance standards when they are
applied to examinees with special needs (i.e., alternative administration or other
modifications to accommodate such examinees)
d. Identify alternative test score interpretation for examinees requiring alternative
administration or other modifications.
10. Pilot test administration.(if possible, and if relevant-sometimes this step is replaced with actual
test administration)
a. Design the test administration to collect score reliability and validity information.
b. Administer the test form(s) to appropriately chosen groups of examinees.
c. Identify and evaluate alternative administration/other modifications, to meet
individual specific needs that may affect validity and reliability of the test or forms of
the test.
d. Evaluate the test administration procedures, test items/tasks, and score reliability
and validity.
e. Make final revisions, to the test or forms of the test based on the available
technical data.
11. Preparation of manuals.
a. Prepare a test administrator’s manual.
b. Prepare a technical manual.
12. Additional technical data collection.

a. Conduct reliability and validity investigations on a continuing basis.1,8

4.1.5. Characteristics of criterion-referenced test

1. Its main objective is to measure student’s achievement of curriculum based skills.


2. It is prepared for a particular grade or course level.
3. It is used to evaluate the curriculum plan, instruction progress and group student interaction.
4. It has balanced representation of goals and objectives.
5. It can be administered before and after instruction.

6. It is generally reported in the form of:

Criterion and Norm Referenced Evaluation Page 9


a) Minimum scores for partial and total mastery of main skill areas.
b) Number of correct items.
c) Percent of correct items.
d) Derived score based on correct items and other factors.2

4.1.6. Uses of criterion-referenced test

Today CRTs are widely used in education, credentialing, the armed services, industry.

1. Criterion-referenced measures are particularly useful when the purpose of testing is to ascertain
whether an individual has attained critical clinical competencies or minimum requirements, such
as for practice or for admission to a specific educational program or course. Special educators
use CRTs with individual education programme to monitor student progress and achievement.
2. Class room teachers use CRT, both in their day-to-day management of student progress and in
their evaluation of instructional approaches.
3. In credentialing CRTs are used to identify persons who have met the test performance
requirement, for a license or certificate to practice in a profession (e.g.:- medical practice, nursing-
IELTS, TOEFL, teaching & accountancy). The NCLEX examination is probably the most
commonly used criterion-referenced test in nursing. A standard is set and each individual must
score at that level or above in order to attain nursing licensure.
4. In educational programs criterion-referenced measurement is best used when there is a need for
tests to examine student progress towards attainment of a designated skill or knowledge level.
5. The application of criterion-referenced measurement for ascertaining clinical skills requires each
student to demonstrate critical behaviors before performance would be considered satisfactory.
6. Criterion-referenced tests are used to determine which students are eligible for promotion to the
next grade or graduation.
7. In clinical practice criterion-referenced measures are sometimes used to determine client ability to
perform specific tasks and skills and to categorize clients in regard to their health status or
diagnosis. For example interpretation of intensity of heart murmur can be classified as grade 1, 2,
3,4, 5 or 6 or the measurement of reflexes during the neurological exam may be categorized as 0,
1+, 2+, 3+ or 4+, or a patient may be categorized as hypotensive, normotensive or hypertensive
based on the blood pressure level. The criterion standards applied during the classification
process have been explicated and incorporated in to these procedures so that results will be
accurate as possible.11,12
8. In the armed forces & industry, CRTs are used to identify the training needs of individuals, to
judge people’s job competence, and to determine whether or not trainees have successfully
completed training programs.

4.1.7. Limitations of criterion-referenced test

Chase (1974) lists the following limitations

1. Criterion-referenced test tells only whether a learner has reached proficiency in task area but
does not show how good or poor a learner’s level of ability.
2. Tasks included in the Criterion-referenced test may be highly influenced by given teacher’s
interests or biases, leading to general validity problem.
3. Only some areas readily lend themselves for listing specific behavioral objectives around which
Criterion-referenced tests can be built and this may be an obstructing element for the teachers.
4. Criterion-referenced test are important for only small fraction of important educational
achievements. On the contrary, promotion and assessment of various skills is a very important
function and it requires norm-referenced testing.1,8

Criterion and Norm Referenced Evaluation Page 10


4.1.8. Administration

1. The test manual should specify the role and responsibilities of the examiner.
2. The test administrators should have adequate information relating to the purpose, time limits,
answer sheets and scoring of the test.
3. The directions of the test should be clear.
4. The test should be easy to score.1

4.1.9. Test layout

1. The test booklet should be attractively printed.


2. The layout of test booklet should be convenient for examiners.8

4.1.10. Reliability

1. The test length should be sufficient enough to find out test score reliability.
2. The sample examinees used in finding out reliability should be adequate and representative.
3. The reliability of information should be provided in the test for each intended use of the test score.
4. The reliability information provided in the test should be appropriate for the use of the score of the
test.1

4.1.11. Cut-off scores

1. There should be rationale for selection of method for determining cut-off score.
2. There should be evidence for the validity of chosen cut-off score.8

4.1.12. Validity

1. The validity evidence should be adequate for the intended use of test score.
2. The test manual should provide an appropriate discussion on the factor affecting the validity of the
scores.8

4.2. NORM-REFERENCED TESTS (NRTs)

While CRTs measure a student’s achievement without reference to other students, the aim of NRT
(Norm-referenced test) is to compare a student’s achievement in relation to the achievement of the
student’s peer group. The word norm in the NRT refers to a designated standard of average performance
of people of given age, background etc. The representative group is known as the ‘Norm group’. Norm
group may be made up of examinees at the local level, district level, state level or national level.5

4.2.1. Definition

“Norm-referenced test is a test designed to measure the growth in a student’s attainment and to compare
his level of attainment with the levels reached by other students and norm group”

- Bormuth (1970)5

“Norm-referenced test is a test designed to provide a measure of performance that is interpretable in


terms of an individual’s relative standing in a group”

- Gronlund (1976)16

Criterion and Norm Referenced Evaluation Page 11


NRTs focus on a student’s performance in relation to other students, rather than in relation to the
objectives of the course. Norms themselves do not represent levels of performance-they provide a frame
work of reference to use when comparing the performances of a group of individuals. NRTs interpret a
student’s raw score as a percentile rank in a group. NRTs do indicate what a student has achieved, the
test only indicate how the student compares to other students in their group. An example of norm-
referenced score is shown below.

The student’s performance equaled or exceeded 82


percent of students in the group

A norm referenced test compares the scores of examinees against the scores of other examinees. Often,
typical scores achieved by identified groups in the population of test takers, i.e., the so-called norms, are
published for these tests. Norm-referenced tests are used to make “selection decisions.” For example, a
nursing college entrance exam might be designed to select applicants for 100 available seats in a
college. College decision makers use the test scores to determine the 100 best people of those taking the
test to fill those positions. Some years a higher quality group of students will qualify and sometimes a
lower quality group. The key, however, is that the test will spread the examinees’ scores out from one
another so that the 100 best performers will be readily identifiable.2, 3, 4

Typical curve for a norm referenced test4


This curve shows the number of people who took the assessment and the scores they achieved. The
bottom scale goes from test scores of zero up to 100. The cut score has been pre-determined depending
upon the number of available seats that is 100 so that the 100 best performers will be readily identified.

4.2.2. Setting pass/fail score for norm-referenced tests

Setting the pass/fail score for a norm-referenced test is fairly simple. First, determine how many people
should pass. A report shows how many people reached each score, enabling test administrators to select
the top set of candidates. For example, using the table below, if 1,000 students should pass for
graduation to the next level of their studies, a passing score of 78.6% would achieve that result.

Scores Number of candidates


0% and above 1,500
77% or above 1,318
78% or above 1,214
78.1% or above 1,156

Criterion and Norm Referenced Evaluation Page 12


78.2% or above 1,034
78.3% or above 1,028
78.4% or above 1,015
78.5% or above 1,004
78.6% or above 1,000
78.7% or above 998
78.8% or above 993
78.9% or above 982
79% or above 961
Setting pass/fail score for norm-referenced test
Source: Journal of University Teaching & Learning Practice.2009; Volume 6, Issue 2, Article 6

NRTs are designed to discriminate between strong and weak students. The tests are designed to provide
a wide range of scores so that the identification of students at different achievement levels is possible.
Therefore, items that all students are likely to answer correctly are eliminated.2,3,4

The NRT format is commonly used on national standardized tests. These tests have a
generalized content that is commonly taught in different areas. The scores provide a general indication of
strengths and weaknesses of the students in a particular institution, and afford the faculty an external
reference point for comparing their curriculum to a composite national curriculum. Because NRTs are not
concerned with the level of individual student achievement, they are usually not appropriate for classroom
use. When assessing developmental learning, Gronlund (1973) suggests using NRTs to rank the
students with the addition of criterion-referenced interpretations applied to the test to assess degrees of
student progress towards an achievement. In clinical settings, norm-referenced interpretations compare a
student’s clinical performance with those of a group of learners, indicating that the student has more or
less clinical competence than others in the group.5

4.2.3. Steps in constructing NRTs

L.M.Carey (1988) has described the following stages for development of NRT.

 Design stage: It is done through:


i. Curriculum analysis.
ii. Selecting objectives to be measured.
iii. Analyzing objectives for determining pre-requisite skills.
iv. Developing table of specifications for test.
v. Determining specifications for items.
 Development stage: This consists of
i. Writing items according to specifications.
ii. Developing needed art work and illustrations.
iii. Writing response directions and examples.
iv. Writing administrative directions.
v. Reviewing items, illustrations and directions.
vi. Developing test layout.
vii. Developing simple test.
 Conducting field test: At this stage test is tried out through
i. Selecting representative group.
ii. Administering test.
iii. Scoring.
iv. Analyzing information.

Criterion and Norm Referenced Evaluation Page 13


v. Analyzing data and selecting items.
vi. Developing final test form.
 Developing test norms: Norms of the test are developed through:
i. Describing characteristics of population.
ii. Selecting representative norm group.
iii. Administering test to norm group.
iv. Scoring.
v. Converting raw scores to standard test.
vi. Creating norm groups.
 Writing test manual: The test manual is written through describing
i. The designs process and skills measured.
ii. The field test process.
iii. The developmental process.
iv. Criteria used to select items.
v. Norm group selection procedure.
vi. Norm group characteristics.
vii. Test characteristics-reliability and standard error of measurement.
viii. Standard administrative procedures.
ix. Scoring procedures and derivation of standard scores.
x. Score interpretation procedures.1,8

4.2.4. Characteristics of norm-referenced test

1. Its basic purpose is to measure student’s achievement in curriculum based skills.


2. It is prepared for particular grade level.
3. It is administered after inspection.
4. It is used for forming homogeneous or heterogeneous class groups.
5. It classifies achievements as above average or below average for a given grade.7

4.2.5 Uses of norm-referenced test

1. Used to make comparisons within the groups or with external groups and to use the data for
predictive purposes, such as admission criteria to an educational institution.
2. In aptitude testing for making differential prediction.
3. To get reliable rank ordering of the pupils with respect to the achievement we are measuring.
4. To identify the pupils who have mastered the essentials of the course more than others.
5. To select the best of the applicants for a particular programme.
6. To find out how effective a programme in comparison with other possible programme.
7. Can be used in selection of nursing staff to a hospital12

4.2.6. Limitations of norm- referenced test

1. Norm-referenced grading tells us nothing about standards


2. Students' actual achievement may not be acknowledged 
3. Students do not know where they stand until the course is over
4. Norm-referencing encourages poor evaluation practices.  11

4.2.7. Reliability

Test length affects reliability: other things being equal, the reliability of test can be increased by
increasing the length. Items of similar content also increase reliability. Items of moderate difficulty

Criterion and Norm Referenced Evaluation Page 14


increase the reliability over the items which are either too easy or too difficult. Increased range of
performance of the examinees being tested tends to increase reliability.15

4.2.8. Validity

Validity of norm-referenced test can be increased by

i. Constructing items of proper difficulty level.


ii. Increasing the test length.
iii. Increasing the heterogeneity of the group.
iv. Administering the test under proper conditions.15

5. CRITERION-REFERENCED TEST (CRT) Vs NORM-REFERENCED TEST (NRT)

J.C. Starley and K.D. Hopkins (1972) have stated as “The word criterion in CRT denotes an instructional
objective, an expected post-instructional outcome, an intended level of student’s performance, an
acceptable level of learner’s achievement or a desired standard of product of performance.” By norm-
referenced testing is meant the measurement of student’s achievement in terms of a group, a class, a
school or a state which is taken as a referent for interpreting student’s scores and for passing judgments.
Thus the typical performance or a norm group is used as the basis for judging individual student learning.
In CRT, the emphasis is on improvement of student’s achievement and in NRT, emphasis on
measurement of achievement.16

5.1. Similarities

1. Achievement domain is measured in both.


2. Same types of items can be used in both.
3. Same rules are followed for writing items in both excepting the item of difficulty.
4. Sample of test items should be relevant and representative in both.1,8

5.2. Differences

Many educators and members of the public fail to grasp the distinctions between criterion-referenced and
norm-referenced testing. It is common to hear the two types of testing referred to as if they serve the
same purposes, or shared the same characteristics. Much confusion can be eliminated if the basic
differences are understood. Popham (1975) compared CRTs and NRTs as follows 1,8

Criterion-Referenced Norm-Referenced
Dimension
Tests Tests
To determine whether each student To rank each student with respect to
has achieved specific skills or the
concepts. achievement of others in broad areas
Purpose of knowledge.
To find out how much students
know before instruction begins and To discriminate between high and low
after it has finished. achievers.

Criterion and Norm Referenced Evaluation Page 15


Measures specific skills which make
up a designated curriculum. These
Measures broad skill areas sampled
skills are identified by teachers and
from a variety of textbooks, syllabi,
Content curriculum experts.
and the judgments of curriculum
experts.
Each skill is expressed as an
instructional objective.

Each skill is tested by at least four Each skill is usually tested by less than
items in order to obtain an adequate four items.
sample of student performance and
Item
to minimize the effect of guessing. Items vary in difficulty.
Characteristics
The items which test any given skill Items are selected that discriminate
are parallel in difficulty. between high and low achievers.
Each individual is compared with a Each individual is compared with
preset standard for acceptable other examinees and assigned a score--
achievement. The performance of usually expressed as a percentile, a
other examinees is irrelevant. grade equivalent score, or a stanine.
Score
Interpretation A student's score is usually Student achievement is reported for
expressed as a percentage. broad skill areas, although some
norm-referenced tests do report
Student achievement is reported for student achievement for individual
individual skills. skills.

Source: Popham, J. W. (1975). Educational evaluation. Englewood Cliffs, New Jersey: Prentice-Hall, Inc. 17

6. RESEARCH ABSTRACT

6.1. Comprehensive practicum evaluation across a nursing program

In an article published by National league for nursing on Nursing Education Perspectives in may 2004,
the authors Reising, Deanna L, Devich and Lynn E says “with the inception of a new competency-based
nursing curriculum, faculty in a baccalaureate nursing program developed a comprehensive laboratory
and clinical evaluation program aimed at progressive, criterion-based evaluation across four semesters of
the nursing program. This article provides background for the development of the program, the resources
needed, and specific evaluation activities for the four semesters targeted. Course content and program
year competencies, progressively built from one semester to the next, guided the design of the practicum
evaluations. Faculty report satisfaction with the ability of this program to determine whether student
performance is consistent with competency achievement. Refinements have been made to alleviate
student stress and evaluator consistency.”

6.2. Seeking quality in criterion referenced assessment

Paper presented at the Learning Communities and Assessment Cultures Conference organized by the
EARLI Special Interest Group on Assessment and Evaluation, University of Northumbria, August 2002,
by Lee Dunn, Sharon Parry and Chris Morgan says that “Over the past decade, traditional norm
referenced methods of assessment have come into question, and criterion referenced assessment in
undergraduate education has gathered considerable momentum as a method of marking, grading and
reporting students' achievements. The value of criterion referencing lies in its capacity to achieve greater

Criterion and Norm Referenced Evaluation Page 16


transparency in marking and the descriptors it gives us for the abilities and achievements of learners.
While the notion of marking and grading against explicit criteria and standards may seem a relatively
simple concept, it is complex conceptually and involves a range of problematic assumptions. This paper
explores some of the difficulties with implementing criterion referenced assessment, including difficulties
in articulating clear and appropriate standards, problems with the alignment of criteria with other elements
of the subject or program, and the competence and confidence of university teachers in exercising
professional judgment. It is argued that quality and authenticity in criterion referenced assessment are
elusive goals and that understanding its guiding principles is not enough. Criterion referenced
assessment must be placed in its disciplinary context.

6.3. Criterion-referenced and norm-referenced assessment of minority group children


An article published by Clifford J. Drew, Associate Professor, Department of Special Education University
of Utah Salt Lake City, USA in May 1973 says that “A variety of problems have been experienced with
psychological assessment of minority children. Traditional norm-referenced measurement has repeatedly
received criticism concerning cultural unfairness or bias. Responses to such accusations primarily have
been in the form of new instrumentation aimed at attaining a culture fair assessment. Little response has
been evident from a conceptual standpoint addressing the issues of purpose and use of test results
Although many have turned to criterion-referenced measurement as an answer to the problems of norm-
referenced evaluation, cultural bias is not necessarily avoided in this framework either. Issues of who
determines criteria and what those criteria include must be addressed if criterion-referenced
measurement is to meet adequately the challenge of multicultural evaluation.

7. CONCLUSION

Evaluation is the process of making judgments about student learning and achievement, clinical
performance, employee competence and educational programs, based on the assessment data.
Broadfoot (2007) emphasized that evaluation is on making judgments about the quality in nursing
education; evaluation typically takes the form of judging student attainment of the educational objectives
and goals in the classroom and the quality of student performance in the clinical settings. With this
evaluation, learning outcomes are measured, further educational needs are identified, and additional
instruction can be provided to assist the students in their learning and in developing competencies for
practice.6

8. REFERENCE

1. Bharat Singh. Educational measurement and evaluation system. New Delhi: Anmol Publishers; 2006.
256 – 282.

2. Bloxham, S. and Boyd, P. Developing Effective Assessment in Higher Education: A Practical Guide,
London: McGraw-Hill Education and Open University Press; 2007.

3. Burton, K. Designing Criterion-Referenced Assessment, Journal of Learning Design, 2006.1, 73-82.

4. Journal of University Teaching & Learning Practice.2009; Volume 6, Issue 2, Article 6

5. McDonald Mary. Systematic assessment of learning outcomes: developing multiple-choice exams.


London: Jones & Bartlett Learning; 2002. 15 - 17.

6. Oermann Marilyn. & Gaberson Kathlee, B. Evaluation and testing in nursing education. 3RDedition. New
York: Springer publishing company; 2009. 6 - 9.

7. Philips Patricia Pulliam. ASTD Handbook of measuring and evaluating training. USA: The American
Society for Training and Development; 2010. 73 – 85.

Criterion and Norm Referenced Evaluation Page 17


8. Qureshi, M.V. Modern school psychology. New Delhi: Anmol Publishers; 2006. 152 – 166.

9. Reynolds Cecil, R. & Kamphaus Randy, W. Hand book of psychological and educational assessment
of children: Intelligence, Aptitude and Achievements. New York: Guilford Press; 2003. 375 - 404.

10. Sankaranarayanan B, Sindhu B. Learning and teaching nursing. Second Edition. Calicut: Brainfill;
2008. 2 – 4, 204 – 214.

11. Shrock Sharon, A.& Coscarell William, C. Criterion-referenced test development: Technical and
Legal guidelines for corporate training. San Francisco: John Wiley & sons; 2007. 25 – 36.

12. Waltz Carolyn Feher & Strickland Ora Lea. Measurement in nursing and health research. New York:
Springer publishing company; 2010. 124 - 128

13. www.712educators.com

14. www.altalang.com

15. www.chiron.valdosta.edu

16. www.leeds.ac.uk

17. www.qualiyuresearchinernational.com

18. www.sciencedirect.com

19. www.scribd.com

20. www.wikepedia.org

Criterion and Norm Referenced Evaluation Page 18

You might also like