You are on page 1of 12
ASSESSMENT OF LEARNING Criterion-referenced measure is a measuring device with a predetermined evel y success or standard on the part ofthe test-takers. For example a level of 75 percent score jy, all the test items could be considered a satisfactory performance, Norm-referenced measure is atest that is scored on the basis of the norm or standarg level of accomplishment by the whole group taking the test. The grades ofthe students are based on the normal curve of distribution, CRITERIA OF A GOOD EXAMINATION A good examination must pass the following criteria: Validity ‘Validity refers tothe degree to which a test measures what itis intended to measure is the usefulness of the test for a given measure, A valid tes is always reliable. To test the validity of a test it is to be pretested in order to determine if it really measures what i intends to measure or what it purports to measure. Reliability Reliability pertains to the degree to which a test measures what it suppose to measure, The tet of reliability is the consistency ofthe results when itis administered to different groups of individuals with similar characteristics in different places at different times. Also, the results are almost similar when the test is given to the same group of individuals at different days and the coefficient of correlation is not less than 0.85. Objectivity Objectivity is the degree to which personal bias is eliminated in the scoring of the answers. When we refer to the quality of measurement, essentially we mean the amount of information contained in a score generated by the measurement Measures of student instructional outcomes are rarely as precise as those of physical characteristics such as height and weight. Student outcomes are more difficult to define, and the units of measurement are usually not physical units. The measures we take on students vary in quality, which prompts the need for different scales of measurement. Terms that describe the levels of ‘measurement in these scales are nominal, ordinal, interval, and ratio. Measurements may differ in the amount of information the numbers contain. These differences are distinguished by the terms nominal, ordinal, interval, and ratio scales of measurement. The terms nominal, ordinal, interval, and ratio actualy form a hierarchy. Nominal scales of measurement are the leat sophisticated and contain the least information. Ordinal, interval, and ratio scales increase respectively in sophistication, The arrangement is a hierarchy in the higher levels, along with additional data. For example, numbers from an interval scale of measurement contain all of the information that nominal and ordinal scales ‘would provide, plus some supplementary input. However, a ratio scale of the same attribute ‘would contain even more information than the interval scale. This idea will become more clear as each scale of measurement is described. Nominal Measurement Nominal scales are the least sophisticated; they merely classify objects or events by assigning numbers to them, These numbers are arbitrary and imply no quantification, but the categories must be mutually exclusive and exhaustive, For example, one could nominally designate baseball positions by assigning the pitcher the numeral; the catcher, 2; the first baseman, 3; the second baseman, 4; and so on, These assignments are arbitrary; no arithmetic of these numbers is meaningful. For example, 1 plus 2 does not equal 3, because a pitcher plus a catcher does not equal a first baseman, Scanned with CamScanner ASSESSMENT OF LEARNING Ordinal Measurement Ordinal scales classify, but they also assign rank order. An example of ordinal measurement is ranking individuals ina class according to thet test scores. Student scores could be ordered from firs, second, third, and so forth o the lowest score. Such a scale gives tite information than nominal measurenvent, but it sil has limitations. The units of ordinal ##€H8urement are most likely unequal, The number of points separating the first and second stutlents probably does not equal the number separating the fifth and sixth students. These uiiequal units of measurement are analogous toa ruler in which some inches are longer than others. Addition and subtraction of such units yield meaningless numbers. Interval Measurement In order to be able to add and subtract scores, we use interval scales, sometimes called equal interval or equal unit measurement. This measurement scale contains the nominal and otdinal properties and is also characterized by equal units between score points. Examples include thermometers and calendar years. For instance, the difference in temperature between 10° and 20° is the same as that between 47° and 57°, Likewise, the difference in length of time between 1946 and 1948 equals that between 1973 and 1975, ‘These measures are defined in terms of physical properties such that the intervals are equal, For example, a year is the time it takes forthe earth to orbit the sun. The advantage of equal units of measurement is straightforward: Sums and differerices now maké sense, both numerically and logically. Note, however, the zero point in interval measurement is really an arbitrary decision; for example, 0° does not mean that there is no temperature, Ratio Measurement ‘The most sophisticated type of medsurement includes all the preceding properties, but ina ratio scale, the zero point is not arbitrary; a score of zero includes the absence of what is being measured. For example, if person's wealth equalled zero, he or she would have no wealth at all. This is unlike a social studies test, where missing every item (i.e, receiving a scoré of zero) may not indicate the complete absence of social studies knowledge, Ratio tiedsurement is rarely achieved in educational assessment, either in cognitive or affective areas. The desirability of ratio measurement scales is that they allow ratio comparisons, such as Ann is 1-1/2 times as tall as her little sister, Mary. We can seldom say that one's intelligence or achievement is 1-1/2 times as great as that of another person. An IQ of 120 may be 1-1/2 times as great numerically as an IQ of 80, bt a person with an IQ of 120 is not, 1-1/2 times as intelligent as a person with an IQ of 80. Note that carefully designed tests over a specified domain of possible items can approach ratio measurement. For example, consider an objective concerning multiplication facts for pairs of numbers less than 10. In all, there are 45 such combinations. However, the teacher might randomly select 5 or 10 test problems to give toa particular student, Then, the proportion of items that the students get correct could be used to estimate how many of the 45 possible items the student has mastered. Ifthe student answers 4 or 5 items correctly, itis idgitimate to estimate that the student would get 36 of the 45 items correctly, its legitimate 48 Estimate that the student would get 36 of the 45 items correct if all 45 items were adininistered. This is possible because the set of possible items was specifically defined in thé objective, and the test items were a random, representative sample from that set. Most cdiicational measurements are better than strictly nominal or ordinal measures, but few can itieet the rigorous requirements of interval measurement, Educational testing usually falls, somewhere between ordinal and interval scales in sophistication. Fortunately, empirical sttidies have shown arithmetic operations on these scales are appropriate, and the scores do provide adequate information for most decisions about students and instruction. Also, as we will see later, certain procedures can be applied to scores with reasonable confidence. 267 Scanned with CamScanner ASSESSMENT OF LEARNING Norm-Referenced and Criterion Referenced Measurement When we contrast normsreferenced measurement (or testing) with criterion-referenced measurement, we are basically referring to two different ways of interpreting information. However, Popham (1988, page 135) points out that certain characteristics tend to go with each type of measurement, and itis unlikely that results of norm-referenced tess are interpreted in criterion-referenced ways and vice versa. Norm-referenced interpretation historically has been used in education; normereferenced tests continue to comprise a substantial portion of the measurement in today's schools. The terminology of crterion-referenced measurement has existed for close tp three decades, having been formally introduced with Glasers (1963) classic article. Over the years, there has been occasional confusion with the terminology and how erterion-referenced ‘measurement applies inthe classroom. Do not infer that just because a testis published, it will necessarily be nom-referenced, or i teacher-constructed, criterion-referenced. Again, we emphasize that the type of measurement or testing depends on how the scores are interpreted. Both types can be used effectively by the teacher. Norm-Referenced Interpretation Norm-referenced interpretation stems from the desire to differentiate among individuals orto discriminate among the individuals of some defined group on whatever is being measured. In norm-referenced measurement, an individual's score is interpreted by comparing it to the scores of a defined group, often called the normative group. Norms represent the scores eared by one or more groups of students who have taken the test, ‘Norm-referenced interpretation is a relative interpretation based on an individual's position with respect to some group, often called the normative group. Norms consist ofthe scores, usually in some form of descriptive statistics, of the normative group. ~ Innorm-referenced interpretation, the individual's position in the normative group i of ‘concem: thus, this kind of positioning does not specify the performance in absolute terms. The norm being used is the basis of comparison and the individual score is designated by its position in the normative group. Achievement Test as An Example. Most standardized achievement tests, especially those covering several skills and academic areas, are primarily designed for norm-referenced interpretations. However, the form of results and the interpretations of these tests are somewhat complex and require concepts not yet introduced in this text. Scores on teacher-constiucted tests are often given norm-referenced interpretations. Grading on the curve, for example, is a norm-referenced interpretation of test scores on some type of performance measure. Specified percentages of scores are assigned the different grades, and an individual's score is positioned in the distribution of scores. (We mention this only as an example; we do not endorse this procedure.) Suppose an algebra teacher has a total of 150 students in five classes, and the classes havea common final examination, The teacher decides that the distribution of letter grades assigned to the final examination performance will be 10 percent As, 20 percent Bs, 40 percent Cs, 20 percent Ds, and 10 percent Fs. (Note that the final examination grade is not necessarily the course grade.) Since the grading is based on all 150 scores, do not assume that 3 students in each class will receive As, on the final examination. James receives a score on the final exam such that 21 students have higher scores and 128 students have lower scores, What will James's letter grade be on the exam? The top 15 scores will receive As, and the next 30 scores (20 percent of 150) will receive Bs, Counting from the top score down, James's score is positioned 22nd, so he will receivea Bon the final examination. Note that in this interpretation example, we did not specify James's actual Scanned with CamScanner ASSESSMENT OF LEARNING numerical score on the exam, That would have been necessary in order to determine that his score positioned 22nd in the group of 150 scores. But in terms of the interpretation of the score, it was based strictly on its positon inthe total group of scores. Criterion-Referenced Interpretation 7 The concepts of criterion-referenced testing have developed with a dual meaning for criterion-referenced. On one hand, it means referencing an individual's performance to some criterion that is a defined performance level. The individual's score is interpreted in absolute rather than relative terms. The criterion, in this situation, means some level of specified formance that has been determined independently of how others might perform. A second meaning for criterion-referenced involves the idea ofa defined behavioral domain—that is, a defined body of leamer behaviors. The learner's performance on a test is referenced to a specifically defined group of behaviors. The criterion in this situation is the desired behaviors, Criterion-referenced interpretation is an absolute rather than relative interpretation, referenced to a defined body of learner behaviors or, as is commonly done, to some specified evel of performance. Criterion-referenced tests require the specification of leamer behaviors prior to constructing the test. The behaviors should be readily identifiable from instructional objec- tives. Criterion-referenced tests tend to focus on specific learner behaviors, and usually only a limited number are covered on any one test, ‘Suppose before the testis administered an 80-percent-correct criterion is established as, the minimum performance required for mastery of each objective. A student who does not attain the criterion has not mastered the skill sufficiently to move ahead in the instructional sequence, Toa large extent, the criterion is based on teacher judgment, No magical, universal criterion for mastery exists, although some curriculum materials that contain ctiterion-referenced tests do suggest criteria for mastery. Also, unless objectives are appropriate and the criterion for achievement relevant, there is little meaning in the attainment of a criterion, regardless of what itis. Distinctions between Norm-Referenced and Criterion-Referenced Tests Although interpretations, not characteristics, provide the distinction between norm-referenced and criterion-referenced tests, the two types do tend to differ in some ways. Norm-referenced tests are usually more general and comprehensive and cover a large domain of content and learning tasks. They are used for survey testing, although this is not their exclusive use. . Criterion-referenced tests focus on a specific group of learner behaviors. To show the contrast, consider an example. Arithmetic skills represent a general and broad category of student outcomes and would likely be measured by a norm-referenced test. On the other hand, behaviors such as solving addition problems with two five-digit numbers or determining the multiplication products of three-and four digit numbers are much more specific and may be measured by criterion-referenced tests. A criterion-referenced test tends to focus more on sub skills than on broad skills. Thus, criterion-referenced tests tend to be shorter. If mastery leaming is involved, crite- tion-referenced measurement would be used. ‘Norm-referenced test scores are transformed to positions within the normative group. Criterion-referenced test scores are usually given in the percentage of correct answers or another indicator of mastery or the lack thereof. Criterion-referenced tests tend to lend Scanned with CamScanner ASSESSMENT OF LEARNING themselves more to individualizing instruction than do norm-referenced tests. In individual. izing instruction, a student's performance is interpreted more appropriately by comparison to the desired behaviors for that particular student, rather than by comparison with the perform. ance of a group. Norm-referenced test items tend to be of average difficulty. Criterion-referenced tests have item difficulty matched to the learning tasks, This distinction in item difficulty ig necessary because norm-referenced tests emphasize the discrimination among individuals and f criterion-referenced tests emphasize the description of performance. Easy items, for example, do little for discriminating among individuals, but they may be necessary for describing performance. Finally, when measuring attitudes, interests, and aptitudes, itis practically impossible to interpret the results without comparing them to a reference group. The reference groups in such cases are usually typical students or students with high interests in certain areas, Teachers have no basis for anticipating these kinds of scores; therefore, in order to ascribe meaning to such a score, a referent group must be used. For instance, a score of 80 on an interest inventory has no meaning in itself. On the other hand, ifa score of 80 is the typical response by a group interested in mechanical areas, the score takes on meaning. STAGES IN TEST CONSTRUCTION 1. Planning the Test ‘A. Determining the Objectives B. Preparing the Table of Specifications C. Selecting the Appropriate Item Format D. Writing the Test Items E. Editing the Test Items Il. Trying Out the Test A. Administering the First Tryout - then Item Analysis B. Administering the Second Tryout - then Item Analysis C. Preparing the Final Form of the Test II]. Establishing Test Validity IV. Establishing the Test Reliability V. Interpreting the Test Score MAJOR CONSIDERATIONS IN TEST CONSTRUCTION The following are the major considerations in test construction: Type of Test . Our usual idea of testing is an in-class test that is administered by the teacher. However, there are many variations on this theme: group tests, individual tests, written tests, oral tests, speed tests, power tests, pretests and post tests. Each of these has different characteristics that must be considered when the tests are planned. If it is a take-home test rather than an in-class test, how do you make sure that students work independently, have equal access to sources and resources, or spend a sufficient but not enormous amount of time on the task? If it is a pretest, should it exactly match the pas test so that a gain score ean be computed, or should the pretest contain itemis that are diagnostic of prerequisite skills and knowledge? If it is an achievement test, should partial credit be awarded, should there be penalties for guessing, or should points be deducted for grammar and spelling errors? Scanned with CamScanner ASSESSMENT OF LEARNING Obviously, the test plan must include a wide array of issues, Anticipating these potential problems allows the test constructor to develop positions o policies’ that are consistent with his or her testing philosophy. These can then be communicated to students, administrators, parents, and others who may be affected by the testing program, Make a list ofthe objectives, the subject matter taught, and the activities undertaken. These are contained in the daily lesson plans of the teacher and in the references or textbook used. Such tests are usually very indirect methods that only approximate real-world applications. The constraints in classroom testing are often due to time and the developmental level of the students. Test Length ‘A major decision in the test planning is how many items should be included on the test. There should be enough to cover the content adequately, but the length of the class period or the attention span or fatigue limits of the students usually restrict the test length. Decisions about test length are usually based on practical constraints more than on theoretical considerations. . Most teachers want test scores to be determined by-how much the student understands rather than by how quickly he or she answers the questions. Thus, teachers prefer power tests, Where at least 90 percent of the students have time to attempt 90 percent of the test items. Just how many items will fit into a given test occasion is something that is learned * through experience with similar groups of students: Item Formats Determining what kind of items to include on the testis a major decision, Should they be objectively scored formats such as multiple choice or matching type? Should they cause the students to organize their own thoughts through short answer or essay formats? These are important questions that can be answered only by the teacher in terms of the local context, his or her students, his or her classroom, and the specific purpose of the test. Once the planning decisions are made, the item writing begins. This tank is often the most feared by the beginning test constructors. However, the procedures are tore common sense than formal rules. POINTS TO BE CONSIDERED IN PREPARING A TEST 1. Are the instructional objectives clearly defined? 2. What knowledge, skills and attitudes do you want to measure? 3. Did you prepare a table of specifications? 4. Did you formulate well defined and clear test items? 5. Did you employ correct English in writing the items? 6. Did you avoid giving clues to the correct answer? 7. Did you test the important ideas rather than the trivial? 8. Did you adapt the test's difficulty to your student's ability? 9. Did you avoid using textbook jargons? 10. Did you cast the items in positive form? 11, Did you prepare a scoring key? 12. Does each item have a single correct answer? 13, Did you review your items? GENERAL PRINCIPLES IN CONSTRUCTING DIFFERENT TYPES OF TESTS a 1. The test items should be selected very carefully. Only important facts should be Scanned with CamScanner ASSESSMENT OF LEARNING 3, Enumeration type a, The exact number of expected answers should be stated. b, Blanks should be of equal lengths. c. Score is the number of correct answers. Lo 4: Identification type a. The items should make an examinee think of a word, number, or group of words that would complete the statement or answer the problem. b. Score is the number of correct answers, . RECOGNITION TYPES 1. True-false or alternate-response type a, Declarative sentences should be used, 'b. The number of “true” and “false” items should be more or less equal. c. The truth or falsity of the sentence should not be too evident. d. Negative statements should be avoided. e. The "modified true-false” is more preferable than the “plain true-false”. £. In arranging the items, avoid the regular recurrence of “true” and “false” statements. + Avoid using specific determiners like: all, always, never, none, nothing, most, often, some, etc. and avoid weak statements as may, sometimes, as a rule, in general etc. Minimize the use of qualitative terms like: few, great, many, more, etc. Avoid leading clues to answers in al items. j. Score is the number of correct answers in “modified true-false and right answers minus wrong answers in “plain true-false”. ® e 2. Yes-No type a. The items should be in interrogative sentences. b. The same rules asin “true-false” are applied. 3, Multiple-response type 1. There should be three to five choices. The number of choices used in the firs item should be the same number of choices in all the items ofthis type of test. . The choices should be numbered or lettered so that only the number or letter can be written on the blank provided. . Ifthe choices are figures, they should be arranged in ascending order. | Avoid the use of “a” or “an” as the last word prior to the listing of the responses, Random occurrence of responses should be employed The choices, as much as possible, should be at the end of the statements. . The choices should be related in some way or should belong tothe same class Avoid the use of “none of these” as one of the choices. Score is the number of correct answers. ° 4, Best answer type a. There should be three to five choices all of which are right but vary in their degree of merit, importance or desirability b. The other rules for multiple-response items are applied here. c, Score is the number of correct answers. wih Scanned with CamScanner ASSESSMENT OF LEARNING” 5. Matching type ‘a, There should be two columns, Under “A” are the stimuli which should be longer and more descriptive than the responses under column “B”. The response may be a word, a phrase, a number, or a formula. The stimuli under column “A” should be numbered and the responses under column “B” should be lettered. Answers will be indicated by letters only on lines provided in column “A”, The number of pairs usually should not exceed twenty items. Less than ten introduces chance elements. Twenty pairs may be used but more than twenty is decidedly wasteful of time. ‘The number of responses in column “B” should be two or more than the number of items in Column “A” to avoid guessing. . Only one correct matching for each item should be possible. Matching sets should neither be too long nor too short. . All items should be on the same page to avoid turning of pages in the process of matching pairs. h, Score is the number of correct answers. = ° s eme C. ESSAY TYPE EXAMINATIONS Common types of essay questions. (The types are related to purposes of which the essay examinations are to be used.) 1, Comparison of two things 2. Explanation of the use or meaining of a statement or passage. 3. Analysis 4. Decisions for or against 5. Discussion How to construct essay examinations. 1. Determine the objectives or essentials for each question to be evaluated. 2. Phrase questions in simple, clear and concise language. 3, Suit the length of the questions to the time available for answering the essay examination. The teacher should try to answer the test herself. 4, Scoring: a. Have a model answer in advance. b. Indicate the number of points for each question. c. Score a point for each essential. ADVANTAGES AND DISADVANTAGES OF THE OBJECTIVE TYPE OF TESTS ‘Advantages a, The objective test is free from personal bias in scoring. b. It is easy to score. With a scoring key, the test can be corrected by different individuals without affecting the accuracy of the grades given. ¢. Ithas high validity because it is comprehensive with wide sampling of essentials. d. Itis less time-consuming since many items can be answered in a given time. e. It is fair to students since the slow writers can accomplish the test as fast as the fast writers. ‘ " Scanned with CamScanner ASSESSMENT OF LEARNING Disadvantages Itis difficult to construct and requires more time to prepare. It does not afford the students the opportunity in training for self and thought organization. c. It cannot be used to test ability in theme writing or journalistic writing. ADVANTAGES AND DISADVANTAGES OF THE ESSAY TYPE OF TESTS Advantages a, The essay examination can be used in practically all subjects of the school curriculum, It trains students for thought organization and self expression. . Itaffords students opportunities to express their originality and independence of thinking. d. Only the essay test can be used in some subjects like composition writing and journalistic writing which cannot be tested by the objective type test. e. Essay’ examination measures higher mental ‘abilities like comparison, interpretation, criticism, defense of opinion and decision. f. The essay test is easily prepared. 7 g. It is inexpensive. oP os Disadvantages a. The limited sampling of items makes the test unreliable measure of achievements or abilities. b. Questions usually are not well prepared. c. Scoring is highly subjective due to the influence of the corrector's personal judgment. 4. Grading of the essay testis inaccurate measure of pupils’ achievements due to subjectivity of scoring. STATISTICAL MEASURES OR TOOLS USED IN INTERPRETING NUMERICAL DATA Frequency Distributions A simple, common sense technique for describing a set of test scores is through the use of a frequency distribution, A frequency distribution is merely a listing ofthe possible score. values and the number of persons who achieved each score. Such an arrangement presents the scores ina more simple and understandable manner than merely listing all of the separate scores. Consider a specific set of scores to clarify these ideas. A set of scores for a group of 25 students who took a $0-item tests listed in Table 1. Itis easier to analyze the scores if they are arranged in a simple frequency distribution. (The frequency distribution for the same set of scores is given in Table 2). The steps that are involved in creating the frequency distribution are: First, list the possible score values in rank order, from highest to lowest. Then, a second column indicates the frequency or number of persons who received each score. For example, three students received a score of 47, two received 40, and so forth, There is'no need to list score values below the lowest score that anyone received. Scanned with CamScanner ASSESSMENT OF LEARNING Table 1. Scores of 25 Students on a 50-Item Test Student Score Student Score. A 48 N B B 50. 0 47 Cc 46 P 48 D 41 Q- 42 E 37 R 44 F 48 s 38 G 38 T 49 H 47 U 34 1 49 Vv 35 J 44 W 47 K 48 x 40. L 49 Y 48 M 40 Table 2. Frequency Distribution of the 25 Scores of Table 1 Score Frequency Score Frequency 30 1 41 1 49 3 40 2 48 3 39 0 47 3 38 2 46 1 37 1 45 0 36 0 44 2 35 1 B 1 34 1 42 1 ‘When there is a wide range of scores in a frequency distribution, the distribution cat be quite long, with a lot of zeros in the column of frequencies. Such a frequency distributior can make interpretation of the scores difficult and confusing. A grouped frequency distribution would be more appropriate in this kind of situation. Groups of score values art listed rather than each separate possible score value, If we were to change the frequency distribution in Table 2 into a grouped frequency distribution, we might choose intervals such as 48-50, 45-47, and so forth. The frequency ‘corresponding to interval 48-50 would be 9 (1+3+5). The choice of the width of the interva is arbitrary, but it must be the same forall intervals. In addition, it is a good idea to have ay odd-numbered interval width (we used 3 above) so that the midpoint of the interval is i whole number. This strategy will simplify subsequent graphs and description of the data. Th grouped frequency distribution is presented in Table 3. Table 3. Grouped Frequency Distribution” Score Interval Frequency 48-50 9 45-47 4 424d 4 39-41 3 36-38 3 33:35 2 26 Scanned with CamScanner ASSESSMENT OF LEARNING Frequency distributions summarize sets of test scores by listing the number of people who received each test score. All ofthe test scores can be listed separately, or the scores can be grouped in a freqiency distribution, MEASURES OF CENTRAL TENDENCY Frequency distributions are helpful for indicating the shape to describe a distributions of scores; but we need more information than the shape to describe a distribution adequately, We need to know where on the scale of measurement a distribution is located and how the scores are dispersed in the distribution, For the former, we compute measures of central tendency, and for the latter, we compute measures of dispersion, Measures of central tendency are points on the scale of measurement, and they are representative of how the scores tend to average, There are three’ commonly used measures of central tendency: the mean, the median, and the mode, but the mean is by far the most widely used. The Mean ‘The mean of a set of scores is the arithmetic miean, Itis found by summing the scores and dividing the sum by the number of scores, The mean i the most commonly used measiire of central tendency because itis easily understood and is based on all ofthe scores inthe set; hence, it summarizes a lot of information. The formula for the mean is as follows : where X is the mean, X is the symbol for a score, the summation operator (it tells us to add all the Xs) Nis the number of scores. For the set of scores in Table 1, X= 1100 N=25, so then 1100. x 44 25 ‘The mean of the set of scores in Table | is 44. The mean does not have to equal an observed score; itis usually not even a whole number. When the scores are arranged in a frequency distribution, the formula is: AX ant Xe, ¥ : where £X mage means that the midpoint ofthe interval is multiplied by the frequency for that interval. In computing the mean for the scores in Table 3, using formula we obtain: 9(49) + 4(46) + 4(43) + 3(40) +337) + 24) x = 43.84 25 ! Note that this mean is slightly different than the mean using ungrouped data. This difference is due to the midpoint representing the scores inthe interval rather than using the actual scores. a Scanned with CamScanner re ASSESSMENT OF LEARNING ‘The Median ‘Another measure of central tendency is the median which is the point that divides the distribution in half; that is, half of the scores fall above the median and half of the scores. fall 2 below the median. ‘When there are only a few scores, the median can often be found by inspection. If there is an odd number of scores, the middle score is the median. When there is an even _ number of scores, the median is halfway between the two middle scores. However, when there ar tied sore in the middle ofthe distribution, or when the sores are in a frequency distribution, the median may not be so obvious. 4 ‘Consider again the frequency distribution in Table 2. There were 25 scores in the distribution, so the middle score should be the median. A straightforward way to find this median is to augment the frequency distribution with a column of cumulative frequencies, Cumulative frequencies indicate the number of scores at or below each score. Table 4 indicates the cumulative frequencies for the data in Table 2. Table 4. Frequency Distribution, Cumulative Frequencies for the Scores of Table 2 Score’ Frequency Cumulative Frequency 50 1 25 49 3 24 48 5 21 47 3 16 46 1 13 45 0 12, 44 2 12 4B 1 10 a2 i 9 41 1 8 40 2 7 39 0 5 38 2 5 37 1 3 36 0 2 35. 1 2 34 1 1 For example, 7 persons scored at or'below a score of 40, and 21 persons scored at or below a score of 48. To find the median, we need to locate the middle score in the cumulative frequency column, because this score is the median. Since there are 25 scores in the distribution, the middle one is the 13th, a score of 46. Thus, 46 is the median of this distribution; half of the people scored above 46 and half scored. When there are ties inthe middle of the distribution, there may be a need to interpolate between scores to get the exact median. However, such precision is not needed for most classroom tests. The whole number closest to the median is usually sufficient. The Mode ‘The measure of central tendency that is the easiest to find is the mode. The mode is the most frequently occurring score in the distribution. The mode of the scores in Table | is 48. Five persons had scores of 48 and no other score occurred as often, Scanned with CamScanner

You might also like