ASSESSMENT OF LEARNING
Criterion-referenced measure is a measuring device with a predetermined evel y
success or standard on the part ofthe test-takers. For example a level of 75 percent score jy,
all the test items could be considered a satisfactory performance,
Norm-referenced measure is atest that is scored on the basis of the norm or standarg
level of accomplishment by the whole group taking the test. The grades ofthe students are
based on the normal curve of distribution,
CRITERIA OF A GOOD EXAMINATION
A good examination must pass the following criteria:
Validity
‘Validity refers tothe degree to which a test measures what itis intended to measure
is the usefulness of the test for a given measure, A valid tes is always reliable. To test the
validity of a test it is to be pretested in order to determine if it really measures what i
intends to measure or what it purports to measure.
Reliability
Reliability pertains to the degree to which a test measures what it suppose to
measure, The tet of reliability is the consistency ofthe results when itis administered to
different groups of individuals with similar characteristics in different places at different
times. Also, the results are almost similar when the test is given to the same group of
individuals at different days and the coefficient of correlation is not less than 0.85.
Objectivity
Objectivity is the degree to which personal bias is eliminated in the scoring of the
answers. When we refer to the quality of measurement, essentially we mean the amount of
information contained in a score generated by the measurement Measures of student
instructional outcomes are rarely as precise as those of physical characteristics such as height
and weight. Student outcomes are more difficult to define, and the units of measurement are
usually not physical units. The measures we take on students vary in quality, which prompts
the need for different scales of measurement. Terms that describe the levels of
‘measurement in these scales are nominal, ordinal, interval, and ratio.
Measurements may differ in the amount of information the numbers contain. These
differences are distinguished by the terms nominal, ordinal, interval, and ratio scales of
measurement.
The terms nominal, ordinal, interval, and ratio actualy form a hierarchy. Nominal
scales of measurement are the leat sophisticated and contain the least information. Ordinal,
interval, and ratio scales increase respectively in sophistication, The arrangement is a
hierarchy in the higher levels, along with additional data. For example, numbers from an
interval scale of measurement contain all of the information that nominal and ordinal scales
‘would provide, plus some supplementary input. However, a ratio scale of the same attribute
‘would contain even more information than the interval scale. This idea will become more
clear as each scale of measurement is described.
Nominal Measurement
Nominal scales are the least sophisticated; they merely classify objects or events by
assigning numbers to them, These numbers are arbitrary and imply no quantification, but the
categories must be mutually exclusive and exhaustive, For example, one could nominally
designate baseball positions by assigning the pitcher the numeral; the catcher, 2; the first
baseman, 3; the second baseman, 4; and so on, These assignments are arbitrary; no arithmetic
of these numbers is meaningful. For example, 1 plus 2 does not equal 3, because a pitcher
plus a catcher does not equal a first baseman,
Scanned with CamScannerASSESSMENT OF LEARNING
Ordinal Measurement
Ordinal scales classify, but they also assign rank order. An example of ordinal
measurement is ranking individuals ina class according to thet test scores. Student scores
could be ordered from firs, second, third, and so forth o the lowest score. Such a scale gives
tite information than nominal measurenvent, but it sil has limitations. The units of ordinal
##€H8urement are most likely unequal, The number of points separating the first and second
stutlents probably does not equal the number separating the fifth and sixth students. These
uiiequal units of measurement are analogous toa ruler in which some inches are longer than
others. Addition and subtraction of such units yield meaningless numbers.
Interval Measurement
In order to be able to add and subtract scores, we use interval scales, sometimes
called equal interval or equal unit measurement. This measurement scale contains the
nominal and otdinal properties and is also characterized by equal units between score points.
Examples include thermometers and calendar years. For instance, the difference in
temperature between 10° and 20° is the same as that between 47° and 57°, Likewise, the
difference in length of time between 1946 and 1948 equals that between 1973 and 1975,
‘These measures are defined in terms of physical properties such that the intervals are equal,
For example, a year is the time it takes forthe earth to orbit the sun. The advantage of equal
units of measurement is straightforward: Sums and differerices now maké sense, both
numerically and logically. Note, however, the zero point in interval measurement is really an
arbitrary decision; for example, 0° does not mean that there is no temperature,
Ratio Measurement
‘The most sophisticated type of medsurement includes all the preceding properties, but
ina ratio scale, the zero point is not arbitrary; a score of zero includes the absence of what is
being measured. For example, if person's wealth equalled zero, he or she would have no
wealth at all. This is unlike a social studies test, where missing every item (i.e, receiving a
scoré of zero) may not indicate the complete absence of social studies knowledge, Ratio
tiedsurement is rarely achieved in educational assessment, either in cognitive or affective
areas. The desirability of ratio measurement scales is that they allow ratio comparisons, such
as Ann is 1-1/2 times as tall as her little sister, Mary. We can seldom say that one's
intelligence or achievement is 1-1/2 times as great as that of another person. An IQ of 120
may be 1-1/2 times as great numerically as an IQ of 80, bt a person with an IQ of 120 is not,
1-1/2 times as intelligent as a person with an IQ of 80.
Note that carefully designed tests over a specified domain of possible items can
approach ratio measurement. For example, consider an objective concerning multiplication
facts for pairs of numbers less than 10. In all, there are 45 such combinations. However, the
teacher might randomly select 5 or 10 test problems to give toa particular student, Then, the
proportion of items that the students get correct could be used to estimate how many of the
45 possible items the student has mastered. Ifthe student answers 4 or 5 items correctly, itis
idgitimate to estimate that the student would get 36 of the 45 items correctly, its legitimate
48 Estimate that the student would get 36 of the 45 items correct if all 45 items were
adininistered. This is possible because the set of possible items was specifically defined in
thé objective, and the test items were a random, representative sample from that set. Most
cdiicational measurements are better than strictly nominal or ordinal measures, but few can
itieet the rigorous requirements of interval measurement, Educational testing usually falls,
somewhere between ordinal and interval scales in sophistication. Fortunately, empirical
sttidies have shown arithmetic operations on these scales are appropriate, and the scores do
provide adequate information for most decisions about students and instruction. Also, as we
will see later, certain procedures can be applied to scores with reasonable confidence.
267
Scanned with CamScannerASSESSMENT OF LEARNING
Norm-Referenced and Criterion Referenced Measurement
When we contrast normsreferenced measurement (or testing) with
criterion-referenced measurement, we are basically referring to two different ways of
interpreting information. However, Popham (1988, page 135) points out that certain
characteristics tend to go with each type of measurement, and itis unlikely that results of
norm-referenced tess are interpreted in criterion-referenced ways and vice versa.
Norm-referenced interpretation historically has been used in education;
normereferenced tests continue to comprise a substantial portion of the measurement in
today's schools. The terminology of crterion-referenced measurement has existed for close tp
three decades, having been formally introduced with Glasers (1963) classic article. Over the
years, there has been occasional confusion with the terminology and how erterion-referenced
‘measurement applies inthe classroom. Do not infer that just because a testis published, it
will necessarily be nom-referenced, or i teacher-constructed, criterion-referenced. Again,
we emphasize that the type of measurement or testing depends on how the scores are
interpreted. Both types can be used effectively by the teacher.
Norm-Referenced Interpretation
Norm-referenced interpretation stems from the desire to differentiate among
individuals orto discriminate among the individuals of some defined group on whatever is
being measured. In norm-referenced measurement, an individual's score is interpreted by
comparing it to the scores of a defined group, often called the normative group. Norms
represent the scores eared by one or more groups of students who have taken the test,
‘Norm-referenced interpretation is a relative interpretation based on an individual's
position with respect to some group, often called the normative group. Norms consist ofthe
scores, usually in some form of descriptive statistics, of the normative group.
~ Innorm-referenced interpretation, the individual's position in the normative group i of
‘concem: thus, this kind of positioning does not specify the performance in absolute terms.
The norm being used is the basis of comparison and the individual score is designated by its
position in the normative group.
Achievement Test as An Example. Most standardized achievement tests, especially
those covering several skills and academic areas, are primarily designed for norm-referenced
interpretations. However, the form of results and the interpretations of these tests are
somewhat complex and require concepts not yet introduced in this text. Scores on
teacher-constiucted tests are often given norm-referenced interpretations. Grading on the
curve, for example, is a norm-referenced interpretation of test scores on some type of
performance measure. Specified percentages of scores are assigned the different grades, and
an individual's score is positioned in the distribution of scores. (We mention this only as an
example; we do not endorse this procedure.)
Suppose an algebra teacher has a total of 150 students in five classes, and the classes
havea common final examination, The teacher decides that the distribution of letter grades
assigned to the final examination performance will be 10 percent As, 20 percent Bs, 40
percent Cs, 20 percent Ds, and 10 percent Fs. (Note that the final examination grade is not
necessarily the course grade.) Since the grading is based on all 150 scores, do not assume
that 3 students in each class will receive As, on the final examination.
James receives a score on the final exam such that 21 students have higher scores and
128 students have lower scores, What will James's letter grade be on the exam? The top 15
scores will receive As, and the next 30 scores (20 percent of 150) will receive Bs, Counting
from the top score down, James's score is positioned 22nd, so he will receivea Bon the final
examination. Note that in this interpretation example, we did not specify James's actual
Scanned with CamScannerASSESSMENT OF LEARNING
numerical score on the exam, That would have been necessary in order to determine that his
score positioned 22nd in the group of 150 scores. But in terms of the interpretation of the
score, it was based strictly on its positon inthe total group of scores.
Criterion-Referenced Interpretation
7 The concepts of criterion-referenced testing have developed with a dual meaning for
criterion-referenced. On one hand, it means referencing an individual's performance to some
criterion that is a defined performance level. The individual's score is interpreted in absolute
rather than relative terms. The criterion, in this situation, means some level of specified
formance that has been determined independently of how others might perform.
A second meaning for criterion-referenced involves the idea ofa defined behavioral
domain—that is, a defined body of leamer behaviors. The learner's performance on a test is
referenced to a specifically defined group of behaviors. The criterion in this situation is the
desired behaviors,
Criterion-referenced interpretation is an absolute rather than relative interpretation,
referenced to a defined body of learner behaviors or, as is commonly done, to some specified
evel of performance.
Criterion-referenced tests require the specification of leamer behaviors prior to
constructing the test. The behaviors should be readily identifiable from instructional objec-
tives. Criterion-referenced tests tend to focus on specific learner behaviors, and usually only
a limited number are covered on any one test,
‘Suppose before the testis administered an 80-percent-correct criterion is established as,
the minimum performance required for mastery of each objective. A student who does not
attain the criterion has not mastered the skill sufficiently to move ahead in the instructional
sequence, Toa large extent, the criterion is based on teacher judgment, No magical, universal
criterion for mastery exists, although some curriculum materials that contain
ctiterion-referenced tests do suggest criteria for mastery. Also, unless objectives are
appropriate and the criterion for achievement relevant, there is little meaning in the
attainment of a criterion, regardless of what itis.
Distinctions between Norm-Referenced and Criterion-Referenced Tests
Although interpretations, not characteristics, provide the distinction between
norm-referenced and criterion-referenced tests, the two types do tend to differ in some ways.
Norm-referenced tests are usually more general and comprehensive and cover a large domain
of content and learning tasks. They are used for survey testing, although this is not their
exclusive use. .
Criterion-referenced tests focus on a specific group of learner behaviors. To show the
contrast, consider an example. Arithmetic skills represent a general and broad category of
student outcomes and would likely be measured by a norm-referenced test. On the other
hand, behaviors such as solving addition problems with two five-digit numbers or
determining the multiplication products of three-and four digit numbers are much more
specific and may be measured by criterion-referenced tests.
A criterion-referenced test tends to focus more on sub skills than on broad skills.
Thus, criterion-referenced tests tend to be shorter. If mastery leaming is involved, crite-
tion-referenced measurement would be used.
‘Norm-referenced test scores are transformed to positions within the normative group.
Criterion-referenced test scores are usually given in the percentage of correct answers or
another indicator of mastery or the lack thereof. Criterion-referenced tests tend to lend
Scanned with CamScannerASSESSMENT OF LEARNING
themselves more to individualizing instruction than do norm-referenced tests. In individual.
izing instruction, a student's performance is interpreted more appropriately by comparison to
the desired behaviors for that particular student, rather than by comparison with the perform.
ance of a group.
Norm-referenced test items tend to be of average difficulty. Criterion-referenced tests
have item difficulty matched to the learning tasks, This distinction in item difficulty ig
necessary because norm-referenced tests emphasize the discrimination among individuals
and f
criterion-referenced tests emphasize the description of performance. Easy items, for example,
do little for discriminating among individuals, but they may be necessary for describing
performance.
Finally, when measuring attitudes, interests, and aptitudes, itis practically impossible
to interpret the results without comparing them to a reference group. The reference groups in
such cases are usually typical students or students with high interests in certain areas,
Teachers have no basis for anticipating these kinds of scores; therefore, in order to ascribe
meaning to such a score, a referent group must be used. For instance, a score of 80 on an
interest inventory has no meaning in itself. On the other hand, ifa score of 80 is the typical
response by a group interested in mechanical areas, the score takes on meaning.
STAGES IN TEST CONSTRUCTION
1. Planning the Test
‘A. Determining the Objectives
B. Preparing the Table of Specifications
C. Selecting the Appropriate Item Format
D. Writing the Test Items
E. Editing the Test Items
Il. Trying Out the Test
A. Administering the First Tryout - then Item Analysis
B. Administering the Second Tryout - then Item Analysis
C. Preparing the Final Form of the Test
II]. Establishing Test Validity
IV. Establishing the Test Reliability
V. Interpreting the Test Score
MAJOR CONSIDERATIONS IN TEST CONSTRUCTION
The following are the major considerations in test construction:
Type of Test .
Our usual idea of testing is an in-class test that is administered by the teacher.
However, there are many variations on this theme: group tests, individual tests, written tests,
oral tests, speed tests, power tests, pretests and post tests. Each of these has different
characteristics that must be considered when the tests are planned.
If it is a take-home test rather than an in-class test, how do you make sure that
students work independently, have equal access to sources and resources, or spend a
sufficient but not enormous amount of time on the task? If it is a pretest, should it exactly
match the pas test so that a gain score ean be computed, or should the pretest contain itemis
that are diagnostic of prerequisite skills and knowledge? If it is an achievement test,
should partial credit be awarded, should there be penalties for guessing, or should points be
deducted for grammar and spelling errors?
Scanned with CamScannerASSESSMENT OF LEARNING
Obviously, the test plan must include a wide array of issues, Anticipating these
potential problems allows the test constructor to develop positions o policies’ that are
consistent with his or her testing philosophy. These can then be communicated to students,
administrators, parents, and others who may be affected by the testing program, Make a list
ofthe objectives, the subject matter taught, and the activities undertaken. These are contained
in the daily lesson plans of the teacher and in the references or textbook used. Such tests are
usually very indirect methods that only approximate real-world applications. The constraints
in classroom testing are often due to time and the developmental level of the students.
Test Length
‘A major decision in the test planning is how many items should be included on the
test. There should be enough to cover the content adequately, but the length of the class
period or the attention span or fatigue limits of the students usually restrict the test length.
Decisions about test length are usually based on practical constraints more than on theoretical
considerations. .
Most teachers want test scores to be determined by-how much the student understands
rather than by how quickly he or she answers the questions. Thus, teachers prefer power
tests, Where at least 90 percent of the students have time to attempt 90 percent of the test
items. Just how many items will fit into a given test occasion is something that is learned
* through experience with similar groups of students:
Item Formats
Determining what kind of items to include on the testis a major decision, Should they
be objectively scored formats such as multiple choice or matching type? Should they cause
the students to organize their own thoughts through short answer or essay formats? These are
important questions that can be answered only by the teacher in terms of the local context,
his or her students, his or her classroom, and the specific purpose of the test. Once the
planning decisions are made, the item writing begins. This tank is often the most feared by
the beginning test constructors. However, the procedures are tore common sense than
formal rules.
POINTS TO BE CONSIDERED IN PREPARING A TEST
1. Are the instructional objectives clearly defined?
2. What knowledge, skills and attitudes do you want to measure?
3. Did you prepare a table of specifications?
4. Did you formulate well defined and clear test items?
5. Did you employ correct English in writing the items?
6. Did you avoid giving clues to the correct answer?
7. Did you test the important ideas rather than the trivial?
8. Did you adapt the test's difficulty to your student's ability?
9. Did you avoid using textbook jargons?
10. Did you cast the items in positive form?
11, Did you prepare a scoring key?
12. Does each item have a single correct answer?
13, Did you review your items?
GENERAL PRINCIPLES IN CONSTRUCTING
DIFFERENT TYPES OF TESTS a
1. The test items should be selected very carefully. Only important facts should be
Scanned with CamScannerASSESSMENT OF LEARNING
3, Enumeration type
a, The exact number of expected answers should be stated.
b, Blanks should be of equal lengths.
c. Score is the number of correct answers.
Lo 4: Identification type
a. The items should make an examinee think of a word, number, or group of words
that would complete the statement or answer the problem.
b. Score is the number of correct answers,
. RECOGNITION TYPES
1. True-false or alternate-response type
a, Declarative sentences should be used,
'b. The number of “true” and “false” items should be more or less equal.
c. The truth or falsity of the sentence should not be too evident.
d. Negative statements should be avoided.
e. The "modified true-false” is more preferable than the “plain true-false”.
£. In arranging the items, avoid the regular recurrence of “true” and “false”
statements. +
Avoid using specific determiners like: all, always, never, none, nothing, most,
often, some, etc. and avoid weak statements as may, sometimes, as a rule, in
general etc.
Minimize the use of qualitative terms like: few, great, many, more, etc.
Avoid leading clues to answers in al items.
j. Score is the number of correct answers in “modified true-false and right answers
minus wrong answers in “plain true-false”.
®
e
2. Yes-No type
a. The items should be in interrogative sentences.
b. The same rules asin “true-false” are applied.
3, Multiple-response type
1. There should be three to five choices. The number of choices used in the firs item
should be the same number of choices in all the items ofthis type of test.
. The choices should be numbered or lettered so that only the number or letter can
be written on the blank provided.
. Ifthe choices are figures, they should be arranged in ascending order.
| Avoid the use of “a” or “an” as the last word prior to the listing of the
responses,
Random occurrence of responses should be employed
The choices, as much as possible, should be at the end of the statements.
. The choices should be related in some way or should belong tothe same class
Avoid the use of “none of these” as one of the choices.
Score is the number of correct answers.
°
4, Best answer type
a. There should be three to five choices all of which are right but vary in their degree
of merit, importance or desirability
b. The other rules for multiple-response items are applied here.
c, Score is the number of correct answers.
wih
Scanned with CamScannerASSESSMENT OF LEARNING”
5. Matching type
‘a, There should be two columns, Under “A” are the stimuli which should
be longer and more descriptive than the responses under column “B”. The
response may be a word, a phrase, a number, or a formula.
The stimuli under column “A” should be numbered and the responses under
column “B” should be lettered. Answers will be indicated by letters only on
lines provided in column “A”,
The number of pairs usually should not exceed twenty items. Less than ten
introduces chance elements. Twenty pairs may be used but more than twenty is
decidedly wasteful of time.
‘The number of responses in column “B” should be two or more than the number
of items in Column “A” to avoid guessing.
. Only one correct matching for each item should be possible.
Matching sets should neither be too long nor too short.
. All items should be on the same page to avoid turning of pages in the process of
matching pairs.
h, Score is the number of correct answers.
= ° s
eme
C. ESSAY TYPE EXAMINATIONS
Common types of essay questions. (The types are related to purposes of which the essay
examinations are to be used.)
1, Comparison of two things
2. Explanation of the use or meaining of a statement or passage.
3. Analysis
4. Decisions for or against
5. Discussion
How to construct essay examinations.
1. Determine the objectives or essentials for each question to be evaluated.
2. Phrase questions in simple, clear and concise language.
3, Suit the length of the questions to the time available for answering the essay
examination. The teacher should try to answer the test herself.
4, Scoring:
a. Have a model answer in advance.
b. Indicate the number of points for each question.
c. Score a point for each essential.
ADVANTAGES AND DISADVANTAGES
OF THE OBJECTIVE TYPE OF TESTS
‘Advantages
a, The objective test is free from personal bias in scoring.
b. It is easy to score. With a scoring key, the test can be corrected by different
individuals without affecting the accuracy of the grades given.
¢. Ithas high validity because it is comprehensive with wide sampling of essentials.
d. Itis less time-consuming since many items can be answered in a given time.
e. It is fair to students since the slow writers can accomplish the test as fast as the
fast writers. ‘ "
Scanned with CamScannerASSESSMENT OF LEARNING
Disadvantages
Itis difficult to construct and requires more time to prepare.
It does not afford the students the opportunity in training for self and thought
organization.
c. It cannot be used to test ability in theme writing or journalistic writing.
ADVANTAGES AND DISADVANTAGES
OF THE ESSAY TYPE OF TESTS
Advantages
a, The essay examination can be used in practically all subjects of the school
curriculum,
It trains students for thought organization and self expression.
. Itaffords students opportunities to express their originality and independence of
thinking.
d. Only the essay test can be used in some subjects like composition writing
and journalistic writing which cannot be tested by the objective type test.
e. Essay’ examination measures higher mental ‘abilities like comparison,
interpretation, criticism, defense of opinion and decision.
f. The essay test is easily prepared. 7
g. It is inexpensive.
oP
os
Disadvantages
a. The limited sampling of items makes the test unreliable measure of achievements
or abilities.
b. Questions usually are not well prepared.
c. Scoring is highly subjective due to the influence of the corrector's personal
judgment.
4. Grading of the essay testis inaccurate measure of pupils’ achievements due to
subjectivity of scoring.
STATISTICAL MEASURES OR TOOLS
USED IN INTERPRETING NUMERICAL DATA
Frequency Distributions
A simple, common sense technique for describing a set of test scores is through the
use of a frequency distribution, A frequency distribution is merely a listing ofthe possible
score. values and the number of persons who achieved each score. Such an arrangement
presents the scores ina more simple and understandable manner than merely listing all of the
separate scores. Consider a specific set of scores to clarify these ideas.
A set of scores for a group of 25 students who took a $0-item tests listed in Table 1.
Itis easier to analyze the scores if they are arranged in a simple frequency distribution. (The
frequency distribution for the same set of scores is given in Table 2). The steps that are
involved in creating the frequency distribution are:
First, list the possible score values in rank order, from highest to lowest. Then, a
second column indicates the frequency or number of persons who received each score. For
example, three students received a score of 47, two received 40, and so forth, There is'no
need to list score values below the lowest score that anyone received.
Scanned with CamScannerASSESSMENT OF LEARNING
Table 1. Scores of 25 Students on a 50-Item Test
Student Score Student Score.
A 48 N B
B 50. 0 47
Cc 46 P 48
D 41 Q- 42
E 37 R 44
F 48 s 38
G 38 T 49
H 47 U 34
1 49 Vv 35
J 44 W 47
K 48 x 40.
L 49 Y 48
M 40
Table 2. Frequency Distribution of the 25 Scores of Table 1
Score Frequency Score Frequency
30 1 41 1
49 3 40 2
48 3 39 0
47 3 38 2
46 1 37 1
45 0 36 0
44 2 35 1
B 1 34 1
42 1
‘When there is a wide range of scores in a frequency distribution, the distribution cat
be quite long, with a lot of zeros in the column of frequencies. Such a frequency distributior
can make interpretation of the scores difficult and confusing. A grouped frequency
distribution would be more appropriate in this kind of situation. Groups of score values art
listed rather than each separate possible score value,
If we were to change the frequency distribution in Table 2 into a grouped frequency
distribution, we might choose intervals such as 48-50, 45-47, and so forth. The frequency
‘corresponding to interval 48-50 would be 9 (1+3+5). The choice of the width of the interva
is arbitrary, but it must be the same forall intervals. In addition, it is a good idea to have ay
odd-numbered interval width (we used 3 above) so that the midpoint of the interval is i
whole number. This strategy will simplify subsequent graphs and description of the data. Th
grouped frequency distribution is presented in Table 3.
Table 3. Grouped Frequency Distribution”
Score Interval Frequency
48-50 9
45-47 4
424d 4
39-41 3
36-38 3
33:35 2
26
Scanned with CamScannerASSESSMENT OF LEARNING
Frequency distributions summarize sets of test scores by listing the number of people
who received each test score. All ofthe test scores can be listed separately, or the scores can
be grouped in a freqiency distribution,
MEASURES OF CENTRAL TENDENCY
Frequency distributions are helpful for indicating the shape to describe a distributions
of scores; but we need more information than the shape to describe a distribution adequately,
We need to know where on the scale of measurement a distribution is located and how the
scores are dispersed in the distribution, For the former, we compute measures of central
tendency, and for the latter, we compute measures of dispersion, Measures of central
tendency are points on the scale of measurement, and they are representative of how the
scores tend to average, There are three’ commonly used measures of central tendency: the
mean, the median, and the mode, but the mean is by far the most widely used.
The Mean
‘The mean of a set of scores is the arithmetic miean, Itis found by summing the scores
and dividing the sum by the number of scores, The mean i the most commonly used measiire
of central tendency because itis easily understood and is based on all ofthe scores inthe set;
hence, it summarizes a lot of information. The formula for the mean is as follows :
where
X is the mean,
X is the symbol for a score, the summation operator (it tells us to add all the Xs)
Nis the number of scores.
For the set of scores in Table 1,
X= 1100
N=25,
so then
1100.
x 44
25
‘The mean of the set of scores in Table | is 44. The mean does not have to equal an
observed score; itis usually not even a whole number.
When the scores are arranged in a frequency distribution, the formula is:
AX ant
Xe,
¥ :
where £X mage means that the midpoint ofthe interval is multiplied by the frequency for
that interval. In computing the mean for the scores in Table 3, using formula we obtain:
9(49) + 4(46) + 4(43) + 3(40) +337) + 24)
x = 43.84
25 !
Note that this mean is slightly different than the mean using ungrouped data. This
difference is due to the midpoint representing the scores inthe interval rather than using the
actual scores.
a
Scanned with CamScannerre
ASSESSMENT OF LEARNING
‘The Median
‘Another measure of central tendency is the median which is the point that divides the
distribution in half; that is, half of the scores fall above the median and half of the scores. fall 2
below the median.
‘When there are only a few scores, the median can often be found by inspection. If
there is an odd number of scores, the middle score is the median. When there is an even _
number of scores, the median is halfway between the two middle scores. However, when
there ar tied sore in the middle ofthe distribution, or when the sores are in a frequency
distribution, the median may not be so obvious. 4
‘Consider again the frequency distribution in Table 2. There were 25 scores in the
distribution, so the middle score should be the median. A straightforward way to find this
median is to augment the frequency distribution with a column of cumulative frequencies,
Cumulative frequencies indicate the number of scores at or below each score. Table 4
indicates the cumulative frequencies for the data in Table 2.
Table 4. Frequency Distribution, Cumulative Frequencies for
the Scores of Table 2
Score’ Frequency Cumulative Frequency
50 1 25
49 3 24
48 5 21
47 3 16
46 1 13
45 0 12,
44 2 12
4B 1 10
a2 i 9
41 1 8
40 2 7
39 0 5
38 2 5
37 1 3
36 0 2
35. 1 2
34 1 1
For example, 7 persons scored at or'below a score of 40, and 21 persons scored at or
below a score of 48.
To find the median, we need to locate the middle score in the cumulative frequency
column, because this score is the median. Since there are 25 scores in the distribution, the
middle one is the 13th, a score of 46. Thus, 46 is the median of this distribution; half of the
people scored above 46 and half scored.
When there are ties inthe middle of the distribution, there may be a need to interpolate
between scores to get the exact median. However, such precision is not needed for most
classroom tests. The whole number closest to the median is usually sufficient.
The Mode
‘The measure of central tendency that is the easiest to find is the mode. The mode is the
most frequently occurring score in the distribution. The mode of the scores in Table | is 48.
Five persons had scores of 48 and no other score occurred as often,
Scanned with CamScanner