You are on page 1of 184

COURSE

SYLLABUS

1
TYPES OF DATA

A variable is a characteristic, description, or attribute of persons or objects,


which assumes different values or labels.

Examples:

a) Height, age, weight are variables, which assumes numerical responses or


values.
Height: 5 feet and 4 inches
Weight: 120 kilograms
Age: 20 years old
b) Gender, religious affiliation, civil status are also variables, which assumes
not values but different labels or categories.
Gender: Male (M) Female (F)
Religious Affiliation: Roman Catholic Adventist
Mormons Protestant
Civil Status: Single Married
Separated Widow/er

Variables are generally classified into two, namely qualitative and quantitative
variables. A qualitative variable yields categorical responses while a quantitative
variable yields numerical responses representing an amount or quantity.

Examples:

Civil status and religious affiliation are qualitative variables.


Number of children in the family, blood pressure, and temperature are
quantitative variables.

Quantitative variables on the other hand can either be discrete or continuous. A discrete
quantitative variable assumes finite or accountably infinite such as 0, 1, 2, 3… and are
usually obtained through the process of counting. A continuous quantitative variable
on the other hand assumes values which are associated with points on an interval of the
number line. These are usually obtained through the process of measurement with
corresponding units.

Examples:

a.) Number of students, and number of patients are discrete quantitative


variables.
b.) Height, weight, and temperature continuous quantitative variables.

2
Variables can be also classified according to their levels of measurement. These
are scales of measuring data.

A nominal data is the crudest form of data. It uses numbers or symbols for the
purpose of categorizing subjects into groups or categories, which are mutually exclusive.
Thus, being in one category automatically excludes one from being a member of another
category. Moreover, the categories are exhaustive, that is. All possible categories of a
variable should be included.

Examples:

a.) Gender can be categorized as either


M- Male
F- Female

Thus, if an individual is a member of the male group then he cannot be a


member of the female group at the same time.

b.) College year level can be categorized as:


I – First Year
II – Second Year
III – Third Year
IV – Fourth Year

An ordinal data possesses all the properties of the nominal data. Hence, it can
be said that can ordinal data is an improvement of the nominal data because in here, the
data are ranked/ ordered in a somewhat “bottom to top” or “high or low” scheme.

Examples:

a.) Student’s class standing is an ordinal data. These are categorized into:
5 – Excellent
4 – Very Good
3 – Good
2 – Fair
1 - Poor

b.) Pain assessment is also an ordinal data, which is categorized as:


0 – No Pain
1 - Moderately Painful
2 – Severely Painful
3 – Very Painful

An interval data possesses all the properties of the nominal and ordinal data.
Here, the data are numeric in nature and the distances between any two

3
numbers are known. However, the interval data, although numeric, does not
have a stable point or absolute zero.

Examples:

Consider the IQ of four students: 70, 140, 75, and 145.

Here we can say that the difference between 140 and 50 in the same as
the difference between 145 and 75. But, we cannot claim that the second student
is twice as intelligent as the first. Is there such a zero IQ?

A ratio data possesses all the properties of the nominal, ordinal and
interval data. It is also numeric in nature and has an absolute zero point. Thus, in
a ration data, we can classify, order/rank them and likewise we can also compare
their magnitudes.

Examples:

Age, income and scores are examples in ratio data.

There are also other classifications of data. Raw data are those, which
are in their original form and structure. Responses out from surveys, taped
interview, and recorded observations are examples in raw data. Grouped data
on the other hand are those placed and summarized in tabular form.

4
METHODS OF DATA COLLECTION

In an statistical investigations, there are a lot of ways of collecting data.


Unfortunately, none of these methods would become the best method because the
choice of appropriate methods to be used largely depends on some factors, which
include the definition of the problem, the research design, and the time element of data
collection and the cooperation of the respondents.

The observation method is the most simple data collection technique. Here, the
data are obtained by merely observing the behaviour of persons, or objects but only at a
particular time of occurrence. The data obtained is called an observation data.

The experimental method is especially useful when one wants to collect data
for cause and effect studies under controlled conditions. In this method, there is actual
interface with the conditions and situations that can affect the variable under study. The
data obtained in this method is called an experimental method.

In the registration method, the respondents provide the necessary information


in compliance with existing laws. For instance, data can be derived from card
registration, birth registrations, voter’s registration and marriage registration.

The use of existing studies also provides an archival method of data


collection. In this method, the primary source is the source in which the data are
measured or gathered by the researcher or agency that published it. The secondary
source on the other hand is the source from which any republication of data is made by
another agency.

In the survey method, the desired information is obtained through asking


question. The survey method or either be direct or personal interview method and
indirect or questionnaire method.

In the direct or personal interview method, there is a person-to-person contact


between the interview and interviewee. This is considered as one of the most effective
methods of data collection because accurate are precise information can be obtained
and verified from the respondents. Moreover, this has a higher response rate but can
only be administered to the respondents’ ne at a time.

The indirect or questionnaire method is considered to the easiest method of


data collection through the use of questionnaire or a data- gathering tool. Unlike the
direct method, this method has a lower response note but can be administered toa large
number of respondents simultaneously.

5
RAW
SCORES

6
RAW SCORES OF 52 STUDENTS IN AN ACHIEVEMENT TEST

163 180 148 156 168 172

177 193 142 152 167 178

189 162 157 161 167 173

176 188 198 158 151 162

167 186 143 164 169 171

182 157 171 197 159 168

153 147 136 161 166 162

7
172 163 173 183 179 173

165 156 165 167

RAW SCORES OF 100 PUPILS IN A VOCABULARY TEST

23 36 40 31

29 15 34 36

31 24 40 45

34 57 20 45

16 33 37 37

12 27 14 43

36 41 41 52

8
39 25 46 49

34 22 21 40

27 18 35 40

39 30 41 42

51 38 16 27

24 26 32 34

26 38 46 41

40 30 45 44

38 29 34 35

32 19 18 26

37 38 32 50

43 29 25 29

37 41 51 35

48 40 34 31

33 43 37 46

34 34 38 20

40 14 31 32

33 42 38 59

THE MASTER SHEET

9
The master sheet or classifier is a device used in arranging scores
or statistical data. With it, scores are easily and conveniently arranged
from the highest to the lowest, or vice versa. The frequency of each score
is easily determined. It is preparatory step to the ranking and grouping of
scores.

Procedures in classifying scores on the master sheet:

1. Determine the highest and the lowest scores.

2. Subtract the tens of the lowest scores from the tens of the highest
scores. Add 4 (constant) to the difference to determine the number
of horizontal line.

3. Draw 13 (constant) vertical lines and many horizontal lines as


computed in step 2.

4. Write in the horizontal cells the units 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, and


“Total”, starting from the second cell; and in the left vertical cells,
“Total”, and tens (ten to the lowest score, followed by the second
ten, and so on to the ten of the highest score), starting from the
bottom.

5. Tally the raw scores in the cell where they fall. The first digit or
digits of a score are represented by the vertical tens and the last
digit by the horizontal units.

6. Count the tallies in every cell and write the total frequencies
corresponding to the tens and the units. Add the total frequencies
of the ten and of the units. The sums must be equal. (These sums
correspond to the number of raw scores for cases consideration.)

EDUCATION 602 - Statistics

10
Name Course Date

THE MASTER SHEET OR CLASSIFIER


Exercise No.1

U N I T S

0 1 2 3 4 5 6 7 8 9 TOTAL

T 19 I I I 3

18 I I I I I I 6

E 17 II II III I I I I 11

16 II III II I II I IIII II I 18

N 15 I I I II II I I 9

14 I I I I 4

S 13 I 1

TOTAL 1 5 8 9 1 2 6 9 7 4 N=52

H.S. = 198 – 19t

L.S. = 136 – 13t

11
Difference = 6t

+ Constant + 4

Horizontal Lines = 10

Constant 13 Vertical line

EDUCATION 602 - Statistics

Name Course Date

THE MASTER SHEET OR CLASSIFIER


Exercise No.1

U N I T S

0 1 2 3 4 5 6 7 8 9 TOTAL

5 I II I I I 6

N 4 IIII-II IIII II III I III III I I 26

3 II IIII IIII III IIII-III III III IIII IIII-I II 40

12
2 II I I I II II III III IIII 19

1 I II I II II I 9

TOTAL 12 12 9 7 13 9 11 9 9 9 N=100

H.S. = 59 – 5t
L.S. = 12 – 1t
Difference = 6t
+ Constant + 4
Horizontal Lines = 10
Constant 13 Vertical line

RANKING OF SCORES

Ranking is the relative placement or arrangement of measures in a series


according to magnitude, value, or quality, from the lowest to the highest or vice
versa. It does not take into account the size of the difference between any two
successive measures. The ranks are successive and continuous but the
differences vary. Moreover, the scores are only indication of the achievements of
pupils. They are not exact measures they signify more or less the
accomplishments of pupils. Hence, the positions of pupils indicated by their ranks
are relative, not absolute.

13
From the ranks, it is possible to determine the bright from the dull and the
mediocre pupils. It is possible to determine percentage of pupils that surpass a
pupil and that surpassed by him. Ranks are also used in the computation of the
coefficient of correlation.

Procedure in the ranking of scores or measures:

1. By using the master sheet, arrange the scores from the highest to the
lowest, writing each score as many times as it appears.

2. Number the score consecutively, giving the highest score tentative rank 1.
Next highest 2, and so on to the lowest score. (The tentative rank of the
lowest score is equal to the total number of case, ‘N’).

3. Assign the ranks as their real ranks. Scores appearing more than once
have the average of their ordinal numbers (tentative ranks) as their real
ranks. (Identical or similar scores have equal ranks).

EDUCATION 602- Statistics

Name Course Date

THE RANKING OF SCORES


Exercise No.2

SCORES TR RR SCORES TR RR

14
198 1 1 165 29

197 2 2 165 30 29.5

193 3 3 164 31 31

189 4 4 163 32
32.5
188 5 5 163 33

186 6 6 162 34

183 7 7 162 35 35

182 8 8 162 36

180 9 9 161 37
37.5
179 10 10 161 38

178 11 11 159 39 39

177 12 12 158 40 40

176 13 13 157 41
41.5
173 14 157 42

173 15 15 156 43

173 16 156 44 43.5

172 17 153 45 45
17.5
172 18 152 46 46

171 19 151 47 47
19.5
171 20 148 48 48

169 21 21 147 49 49

168 22 143 50 50

168 23 22.5 142 51 51

167 24 136 52 52

167 25 25.5 N = 52

15
167 26 Legend:

167 27 TR = Tentative Rank

166 28 28 RR = Real Rank

EDUCATION 602- Statistics

Name Course Date

The Ranking Scores


Exercise No.2

Scores TR RR Scores TR RR Scores TR RR

59 1 1 37 41 26 80

57 2 2 37 42 26 81 81

52 3 3 37 43 43 26 82

51 4 37 44 25 83
4.5 83.5
51 5 37 45 25 84

50 6 6 36 46 24 85
85.5
49 7 7 36 47 47 24 86

47 8 8 36 48 23 87 87

46 9 10 35 49 50 22 88 88

46 10 35 50 21 89 89

46 11 35 51 20 90 90.5

16
45 12 34 52 20 91

45 13 13 34 53 19 92 92

45 14 34 54 18 93
93.5
44 15 15 34 55 18 94
55.5
43 16 34 56 16 95
95.5
43 17 17 34 57 16 96

43 18 34 58 15 97 97

42 19 34 59 14 98
19.5 98.5
42 20 33 60 14 99

41 21 33 61 61 12 100 100

41 22 33 62 N = 100

41 23 23 32 63

41 24 32 64
64.5
41 25 32 65

40 26 32 66

40 27 31 67

40 28 31 68 68.5

40 29 29 31 69

40 30 31 70

40 31 30 71 71.5

40 32 30 72 Legend:

39 33 29 73 TR = Tentative Rank
33.5
39 34 29 74 RR= Real Rank
74.5
38 35 29 75

38 36 37.5 29 76

38 37 27 77 78

17
38 38 27 78

38 39 27 79

38 40

THE SCORE DISTRIBUTION

The grouping of scores in a score distribution is resorted to when there are


few cases not more than 30.

Grouping scores in a score distribution, as well as in a step or frequency


distribution, makes the data partly meaningful. At a glance the recurrence of
frequency of a score is seen, that is, how many times a certain score appears.
Where most of the scores cluster8 is also evident.

The distribution also indicates whether the examination is easy, difficult, or


of moderate difficulty. If most of the score are high, the examination is relatively
easy; if most of the score is low, the examination is difficult; if most of the scores
are found in the center of distribution, the examination is of moderate difficulty.

Scores are grouped in score or step distribution to economize space and


to facilitate computation of statistical measures like the median and the arithmetic
mean taken in later exercises.

Procedures in grouping scores or measures in a score distribution:

1. By using the master sheet, arrange the scores from the highest to the
lowest, writing each score only once.

2. Take the raw scores and place a tally after each score as many times as
the score appears.

3. Count the tallies opposite each score and write the number opposite the
tallies themselves. This number of tallies is the frequency (f) of the score.

4. Add the frequencies and write the sum at the bottom of the tabulation to
get N, the total of scores or cases.

18
EDUCATION 602 – Statistics

Name Course Date

Score Distribution
Exercise No. 3

Scores Tallies Freq. Scores Tallies Freq. Scores Tallies Freq.

198 I I 173 III 3 158 I 1

197 I I 172 II 2 157 II 2

193 I I 171 II 2 156 II 2

194 I I 169 I 1 153 I 1

188 I I 168 II 2 152 I 1

186 I I 167 IIII 4 151 I 1

183 I I 166 I 1 148 I 1

19
182 I I 165 II 2 147 I 1

180 I I 164 I 1 143 I 1

179 I I 163 II 2 142 I 1

178 I I 162 III 3 136 I 1

13

177 I I 161 II 2

176 I I 159 I 1

13 26

SUMMARY:

I = 13

II = 26

III = 13

IV = 52

EDUCATION 602 – Statistics

Name Course Date

Score Distribution
Exercise No.3

20
Scores: Tallies Freq. Scores: Tallies Freq. Scores Tallies Freq.
:

59 I 1 40 IIII-II 7 26 III 3

57 I 1 39 II 2 25 II 2

52 I 1 38 IIII-I 6 24 II 2

51 II 2 37 IIII 5 23 I 1

50 I 1 36 III 3 22 I 1

49 I 1 35 III 3 21 I 1

48 I 1 34 IIII-III 8 20 II 2

46 III 3 33 III 3 19 I 1

45 III 3 32 IIII 4 18 II 2

44 I 1 31 IIII 4 16 II 2

43 III 3 30 II 2 15 I 1

21
42 II 2 29 IIII 4 14 II 2

41 IIII 5 27 III 3 12 I 1

25 54 21

SUMMARY:

I = 25

II = 54

III = 21

N = 100

THE FREQUENCY DISTRIBUTION

Data collected from the test and experiments may have little meaning to
the investigator until they have been arranged or classified in some systematic
way. The first task therefore is to organize our materials and this leads naturally
to a grouping of scores into classes or steps.

Procedures in grouping scores or measures into a frequency distribution:

1. Determine the range. The range is the gap between the highest and the
lowest scores- the difference that results when the lowest score is
subtracted from the highest score.

2. Determine the class interval. (a) To minimize the error and to avoid too
much labor, it is suggested that the number of steps should not be less
than 10 nor more than 20. The ideal number should be between 12 and
15. Under exceptional cases the number which given ranges nay yield
may be below 10 nor more than 20. A good rule is to select an odd
number for a class interval ( I ) which will give a quotient of between ten
and fifteen when the range is divided by it. Be sure the interval chosen will

22
nor spread the data out too much, thus losing the benefit of grouping, nor
crowd the scores into coarse categories. (b) Another method of
determining the i = (by Ross) add 1 (constant) to the range and divide
the sum by 12 (constant).

3. Determine the limits of the classes or steps. For the lower limit of the
highest step, choose a number which is nearest to or equal to the highest
score, but not exceeding it, and which is exactly divisible by the size of the
class interval. The upper limit is determined by adding to the lower limit
one number less than i . The succeeding limits are determined by
subtracting the size of i from the proceeding lower and upper limits.

4. Make the tabulation. Tally the raw scores opposite their proper interval or
class. The total number of tallies of each class interval (frequency) is
written in a column labelled f. The sum of f column is called n (number of
cases).

EDUCATION 602 – Statistics

Name Course Date

The Frequency Distribution


Exercise No.4

Scores Tallies Frequencies

195-199 II 2

190-194 I 1

23
185-189 IIII 3

180-184 III 3

175-179 IIII 4

170-174 IIII – II 7

165-169 IIII – IIII 10

160-164 IIII – III 8

155-159 IIII – I 6

150-154 III 3

145-149 II 2

140-144 II 2

135-139 I 1

N = 52

H = 198 5 = i

24
L = 136 12 ) 63

Rule No.1 : If the quotient is an

62 60 odd number, it

automatically

+ 1 3 becomes the

interval ( i ).

63

EDUCATION 602 – Statistics

Name Course Date

Frequency Distribution
Exercise No.4

Scores Tallies Frequencies

57-59 II 2

H = 59 3 = i

54-58 0

L = 12

51-53 III 3 4

47 12 48

48-50 III 3

48

25
45-47 IIII - I 6

4 0

42-44 IIII - I 6 3 12

39-41 IIII – IIII - IIII 14 12

36-38 IIII – IIII - IIII 14 0

33-35 IIII – IIII - IIII 14

30-32 IIII - IIII 10

Rule No.2 :

27-29 IIII - II 7

If the quotient is even, without


any remainder, take the preceding
24-26 IIII - II 7 odd number.

21-23 III 3

18-20 IIII 5

15-17 III 3

12-14 III 3

N = 100

26
THE FREQUENCY POLYGON

Aid in analysing numerical data is obtained from a graphics or pictorial treatment


of the frequency distribution. The advertiser has long used graphics methods because
these advises catch the eye and hold the attention when the most careful array of
statistical evidence fails to attract notice. For this and other reasons the research worker
also utilizes the attention getting power of visual presentation; and at the same time,
seeks to translate numerical facts – often abstract and difficult of interpretation – into
more concrete and understandable form.

Four methods of representation a frequency distribution graphically are in use;


the frequency polygon, the histogram, the cumulative frequency graph, and the
cumulative percentage curve or ogive.

The frequency polygon and the histogram have the same uses. The score
opposite the summit of the frequency polygon is the crude mode; the midpoint of the tip
in the highest rectangle of the histogram is a crude mode. From the frequency polygon
and the histogram, it is possible to gleam the “representativeness” of the group
concerned. If the graph plotted is similar to the shape of bell, the group is more or less
typical. The more the irregular the shape of the graph, the less representative is the
group. It is also possible to note the tendency of the measure - whether they are piled
up at the low (or high) end of the scale and or evenly and regularly distributed over the
scale. If the test is so easy, the score accumulated at the high end of the scale, whereas
the test is too hard, scores will crowd at the low end of the scale. When the test will be
distributed symmetrically around the mean, few individuals scoring quite high, few quite,
low, and the majority failing somewhere near the middle of the scale.

The frequency polygon is less precise than the histogram in that it does not
represent accurately, i.e. in terms of areas, the frequency upon each interval. In
comparing two or more graphs plotted on the same axes, however, the frequency
polygon is likely to be more useful as the vertical and horizontal lines in the histogram
will often coincide.

Procedure in plotting a frequency polygon:

1. Labelling the points on the base line. There are several ways of labelling the
intervals along the base line X axis of the frequency polygon. For example, step
195-199 of our frequency distribution which has an 1 of 5 may be interpreted as
having the following limits:

a. Expressed limits: 195-199. This means that this interval begins with score
195 and ends with the score 199. These limits are ideal for graphing the
scores in a frequency distribution because of the ease in tallying.

27
b. Score limits: 195-200. This interval means that 11 scores mean
from 195 up but not more including 200 – fall within this grouping.
These limits are conveniently used in labelling the point in the base
line of the frequency polygon, but inconveniently use din tallying
because it is fairly easy for one to let score 200 slip into interval
195-200 owing simply to be presence of 200 at the upper limit of
the interval.

c. Exact limits: 194.5-199.5. This interval begins exactly at 194.5 (not


at 195) and ends at 199.5 (not at 199). Apparently, this is time-
consuming and clumsy. However, this may be used (rather than in
labelling the points on the base line of the graph).

2. Plotting midpoints. Frequencies on each interval are plotted above the


midpoints of the intervals on the X axis. They are represented in each
instance by a dot the specified distance up on Y and midway between the
lower and upper limits of the interval upon which it falls.

3. Drawing the frequency polygon. When all the points have been located in
the diagram, they are joined by a series of short lines to form the
frequency polygon. In order to complete the figure (i.e., to bring it down to
the base line), one additional interval at the low end and one additional
interval at the high end of the distribution are include on the X scale. The
frequency on each these intervals is, of course, zero.

4. Dimension of the frequency polygon. In order to give symmetry and


balance to the polygon, care must be exercised in the selection of unit
distances to represent the intervals on the X axis and the frequencies on
the Y axis. A good general rule is to select X and Y units which will make
the height to width may vary 60-80%.

5. Area of the polygon. The total frequency (m) of a distribution is


represented by the area of its polygon; that is, the area bounded by the
frequency surface and the X axis.

28
Steps in constructing a frequency polygon:

1. Draw two straight lines perpendicular to each other, the vertical line near left side
of the paper, the horizontal line near the bottom. Label the vertical line (the Y
axis) OY, and the horizontal line (the X axis) OX. Put 0 where the two lines
intersect. This point is the origin.
2. Lay off the score intervals of the frequency distribution at regular distance along
X axis. Begin with the interval next below the lowest in the distribution, and end
intervals next above the highest in the distribution. Label the successive X
distance with the score interval limits. Select an X unit which will allow all the
intervals to be represented easily on the graph paper.
3. Mark off the Y axis successive units to represent the scores (the frequencies) on
the different intervals. Choose a Y scale which will make the largest frequency
(the height of the polygon approximately 75% or 60-80% of the width of the
figure).
4. At the midpoint of each interval on the X axis go up in the Y direction a distance
equal to the number of scores on the interval. Place points at these locations.
5. Joint the points plotted in step 4 with straight lines to give the frequency surface.

The Frequency Polygon

Exercise No.5

Polygon- from Greek term polygonon which means many angled

Polygon- a figured especially a closed plane figure have 3 or more straight sides.

-visual aids

-attention getting

-catch the eye

-seeks to translate numerical facts

Four Methods in general use:

1) frequency polygon

2) histogram

3) cumulative frequency graph

4) cumulative percentage curve or ogive

29
Three types of limits:

1. Expressed limits- limits ideal for tallying

140- 144 *expressly tells us what is included in a class.

135-139

2. Score limits- score added- limits ideal for labelling for graphs

140-145 (140-144)

135-140 (135-139)

3. Exact limits- this is expressed in 0.5 or decimals

139.5-144.5 (lower limit is .5 less) 140-145

134.5-139.5 (upper limit is .5 more) 135-140

Purpose is for (1)labeling and for

(2) computation

EDUCATION 602 – Statistics

Name Course Date

The Frequency Polygon

Exercise No.5

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

10

F
9

R
8

E
7

Q
6

30
5

2
U
1
E
0
N

C
130 135 140 145 150 155 160 165 170 175 180 185 190 195 200 205

S C O R E S

Expressed Limits Score Limits

Scores F Scores F Orig. Steps= 13

Added = 2

200-204 0 200-204 0 Constant = 1

195-199 2 195-199 2 Vertical Lines= 16

190-194 1 190-194 1 Width x .75

185-189 3 185-189 3 + .80

180-184 3 180-184 3 112

175-179 4 175-179 4 Horizontal = 12.00

170-174 7 170-174 7 Lines

165-169 10 165-169 10

160-164 8 160-164 8 0.83= Freq./

155-159 6 155-159 6 12 10.00 Lines

150-154 3 150-154 3 - 96

31
145-149 2 145-149 2 40

140-144 2 140-144 2 - 36

135-139 2 135-139 2 4

130-134 1 130-134 1

N=52 N=52

EDUCATION 602 – Statistics

Name Course Date

The Frequency Polygon


Exercise No.5

Expressed Limits Score Limits

Orig. Steps = 16

Scores Freq. Scores Freq. Added steps = 2

Constant = 1

60-62 0 60-62 0 Vertical = 19

57-79 2 57-79 2 Lines

54-56 0 54-56 0

51-53 3 51-53 3 Horizontal lines= .75 of vertical

48-50 3 48-50 3 lines

45-47 6 45-47 6 19

42-44 6 42-44 6 x .75

39-41 14 39-41 14 95

36-38 14 36-38 14 + 133

33-35 14 33-35 14 14.25

32
30-32 10 30-32 10

27-29 7 27-29 7

24-26 7 24-26 7 0.98 or 1

21-23 3 21-23 3 14.25 14.00 Convert

18-20 5 18-20 5 - 12.82 into a

15-17 3 15-17 3 1.18 Whole

12-14 3 12-14 3

9-11 0 9-11 0

N=100 N=100

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
15

14
F
13
R
12
E
11

Q 10

U 9

E 8

N 7

C 6

Y 5

0
9 12 15 18 21 24 27 30 33 36 39 42 45 48 51 54 57 60 63

33
ORIGINAL AND SMOOTHED FREQUENCY POLYGONS

If the sample is small and the frequency distribution is somewhat irregular,


the polygon tends to be jagged in online. To iron out change irregularities, and
also get a better notion of how the figure might look if the data were more
numerous, the frequency polygon maybe moving “smoothed”. In smoothing, a
series of “moving” or “running” averages are taken from which new or adjustment
frequencies are determined.

Procedure in smoothing a frequency polygon:

1. Find the adjust or “smoothed” if every class interval (including the one
additional step next below the lowest in distribution and the one additional
step next above the highest in the distribution) by adding the f on the
given interval and the f’s on the two adjacent intervals (the interval just
below and the interval just above) and dividing the sum by three (3). (The
total of all the adjusted frequencies should equal the number of cases.)

2. By using the same graph paper on paper on which the original frequency
polygon has been constructed, place the point at the midpoint of each
interval on the X axis corresponding to the adjusted f in the Y direction.

3. Join the points plotted in step in the 2 with straight or dotted lines (by
using a different ink color) to complete the smoothed polygon.

34
EDUCATION 602 – Statistics

Name Course Date

The Original and Smoothed Frequency Polygons


Exercise No.6

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

10

9
F
8
R
7
E
6
Q
5
U
4
E
3
N
2
C
1
Y
0 1

130 135 140 145 150 155 160 165 170 175 180 185 190 195 200 205

S C O R E S

Expressed Limits Score Limits Adjusted

frequencies

35
Scores F Scores F

200-204 0 200-204 0 0 2/3

195-199 2 195-199 2 1

190-194 1 190-194 1 2

185-189 3 185-189 3 2 1/3

180-184 3 180-184 3 3 1/3

175-179 4 175-179 4 4 2/3

170-174 7 170-174 7 7

165-169 10 165-169 10 8 1/3

160-164 8 160-164 8 8

155-159 6 155-159 6 5 2/3

150-154 3 150-154 3 3 2/3

145-149 2 145-149 2 3 2/3

140-144 2 140-144 2 1 2/3

135-139 2 135-139 2 1

130-134 1 130-134 1 0 1/3

N=52 N=52 N=47 15/3

or 52

EDUCATION 602 – Statistics

Name Course Date

The Original and Smoothed Frequency Polygons


Exercise No.6

36
Expressed Limits Score Limits

Scores Freq. Scores Freq. Adjusted Frequencies

60-62 0 60-62 0 2
/3

57-59 2 57-79 2 2
/3

54-56 0 54-56 0 1 2
/3

51-53 3 51-53 3 2

48-50 3 48-50 3 4

45-47 6 45-47 6 5

42-44 6 42-44 6 8 2
/3

39-41 14 39-41 14 11 1/3

36-38 14 36-38 14 14

33-35 14 33-35 14 12 2/3

30-32 10 30-32 10 10 1/3

27-29 7 27-29 7 8

24-26 7 24-26 7 5

21-23 3 21-23 3 5 2
/3

18-20 5 18-20 5 3 2
/3

15-17 3 15-17 3 3 2
/3

12-14 3 12-14 3 2

9-11 0 9-11 0 1

N=100 N=100 N=94 112/3 or 96+4=100

F
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
15

37
14

13

12

11

10

R 8

E 7

Q 6

U 5

E 4

N 3

C 2

0
9 12 15 18 21 24 27 30 33 36 39 42 45 48 51 54 57 60 63

THE HISTOGRAM OR COLUMN DIAGRAM

In the frequency polygon, all the score within a given interval are
represented by the midpoint of that interval, whereas in a histogram the scores
are assumed to be spread uniformly, over the entire interval. Within each interval
of a histogram the frequency is shown by a rectangle, the base of which is the
length of the interval, and the height of which is the number of scores within the
interval.

Procedures of constructing a histogram or column diagram:

1. Draw OX and OY as in the frequency polygon and lay off equal distance
on both axes – OX for the steps and OY for the frequencies.

38
2. Lay off the score intervals of the frequency distribution along the X-axis.
Begin with the highest interval in the distribution.

3. Mark off on the Y axis successive units to represent the frequencies on


the different intervals.

4. Draw line limited by the lower and upper limits of the steps, instead of a
point at the midpoint as in the adjacent end of the lines. The figure is a
histogram.

5. Connect by straight vertical lines every two adjacent end of the lines. The
figure is histogram.

6. Shade the histogram to bring about clearly the total area of the figure.

EDUCATION 602 – Statistics

Name Course Date

The Histogram or Column Diagram


Exercise No.7

Y
10 1 2 3 4 5 6 7 8 9 10 11 12 13 14

39
32

F 9
31

R 8
22 30

E 7
21 29 39

Q 6
14 20 28 38

U 5
13 19 27 37

E 4

12 18 26 36 43

N 3

8 11 17 25 35 42 46 49

C 2

3 5 7 10 16 24 34 41 45 48 52

Y 1
1 2 4 6 9 15 23 33 40 44 47 50 51
X
o

135 140 145 150 155 160 165 170 175 180 185 190 195 200

S C O R E S

Score Freq.

Vertical Lines:

195-199 2 2 lines less than in

190-194 1 The polygon

185-189 3

180-184 3 Original Steps = 13

175-179 4 Constant + 1

170-174 7 14

165-169 10

160-164 8 Horizontal Lines:

155-159 6 Same as in the

40
150-154 3 frequency

145-149 2 polygon

140-144 2

135-139 2

130-134 1

N=52

EDUCATION 602 – Statistics

Name Course Date

The Histogram or Column Diagram


Exercise No.7
Y
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
14

52 66 80
13

51 65 79
12

50 64 78
11

49 63 77

10
38 48 62 76
F 9

R 8
37 47 61 75

E 7
36 46 60 74

Q 6
21 28 35 45 59 73

U 5
20 27 34 44 58 72 86 92

E 4
11 19 26 33 43 57 71 85 91

41
3 10 18 25 32 42 56 70 84 90

2 3 6 9 14 17 24 31 41 55 69 83 89 95 98

N 1
2 5 8 13 16 23 30 40 54 68 82 88 94 97 100

C o
1 4 7 12 15 22 29 39 53 67 81 87 93 96 99
X

Y 12 15 18 21 24 27 30 33 36 39 42 45 48 51 54 57 60

S C O R E S

Score Freq.

Vertical Lines:

57-59 2 2 lines less than in

54-56 0 the polygon

51-53 3

48-50 3 Original Steps = 16

45-47 6 Constant + 1

42-44 6 17

39-41 14

36-38 14 Horizontal Lines:

33-35 14 Same as in the

30-32 10 frequency

27-29 7 polygon

24-26 7

21-23 3

18-20 5

15-17 3

12-14 3

42
N=100

MEASURE OF CENTRAL TENDENCY

When scores or other measure have been tabulated into a frequency


distribution, usually the next task is to calculate a measure of central tendency, or
central position. The value of a measure of central tendency is twofold: (1)It is an
“average” which represents all of the scores made by the group, and as such
gives a concise description of the performance of the group as a whole; and (2) it
enables us to compare two or more groups in terms of typical performance.

There are three “average” or measures of central tendency in common


use: the arithmetic mean, the median, and the mode. The “average” is the
popular term for the arithmetic mean. In statistical work, “average” is the general
for any measure of central tendency.

THE MEAN

The arithmetic mean is the most reliable measure of central tendency;


hence, its wide in use in scientific educational literature. It is generally preferred
to other averages as it is rigidly defined mathematically and is based upon all of
the measures. It is advantage when the scores are distributed symmetrically
around a central point, when the measure of central tendency having the greatest
stability is wanted, and when other statistic such as standard deviation and the
coefficient of correlation are to be computed later.

The mean is the average of the scores or measures. It is the sum of the
separate scores divided by their number. It is dependent on the magnitude of
scores. Changing a score even by one, more or less, change the mean.

43
The Absolute Mean

There are three methods of computing the mean. One method, the long or
absolute method, which is used when the data are ungrouped, is the subject of
this exercise.

Procedure in calculating the mean by the long r absolute method:

1. Find the sum of the series of ungrouped raw scores (EX).

2. Count the scores to get the number of cases (N).

3. Divide the sum by the number of cases. The quotient is the arithmetic
mean r simply mean (M).

∑= sum of

Formula: M = ∑X where X= scores / measures

N N= number of cases

44
EDUCATION 602 – Statistics

Name Course Date

The Mean by the Absolute Mean


Exercise No.8 a

163 186 165 151

177 157 156 169

189 147 152 159

176 163 161 166

167 156 158 179

182 148 164 172

153 142 197 178

172 157 161 173

165 198 183 162

180 143 167 171

193 171 168 168

162 136 167 162

45
188 173 167 173

2267 2077 2166 2183

∑X

I = 2,267
M =

II = 2,077

III = 2,166

8,693
IV = 2,183
=

52
∑ X = 8,693

M = 167.17

EDUCATION 602 – Statistics

Name Course Date

46
The Mean by the Absolute Mean
Exercise No.8 a

23 48 29 41 43

29 33 19 16 52

31 34 38 32 49

34 40 29 46 40

16 33 41 45 40

12 36 40 34 42

36 15 43 18 27

39 24 34 32 34

34 57 14 25 41

27 33 42 51 44

39 27 40 34 35

51 41 34 37 26

24 25 40 38 50

26 22 20 31 29

40 18 37 38 35

38 30 14 31 31

32 38 41 36 46

37 26 46 45 20

43 38 21 45 32

37 30 35 37 59

648 648 657 712 775

47
∑X

I = 648
M =

II = 648

III = 657

3,440
IV = 712
=

100
V = 775

∑ X = 3,440
M = 34.40

WHEN TO USE THE VARIOUS MEASURES OF CENTRAL TENDENCY

Statistics in Psychology and Education by Henry G. Garret:

1. USE THE MEAN


a. When the scores are distributed symmetrically around a central point, i.e.
when the distribution is not badly showed. The M is the center of gravity
in the distribution, and each score contributes to its distribution.
b. When the measure of central tendency having the greatest reliability is
wanted.
c. When other statistics (e.g. SD coefficient of the greatest of correlation)
are to be computed later. Many statistics are based upon the mean.
2. USE THE MEDIAN
a. When the exact midpoint of the distribution is wanted.
b. When there are extreme scores which would markedly affect the mean.
c. When it is desired that certain scores should influence the central
tendency, but all that is known about them is that they are above or below
median.
3. USE THE MODE

48
a. When a quick and approximate measure of central tendency is all that is
wanted.
b. When the measure of central tendency should be the most typical value.

Fundamental Statistics in Psychology and Education by J.P. Guilford:

1. COMPUTE THE ARITHMETIC MEAN WHEN:


a. The greatest reliability is wanted. It usually varies less from sample to
sample drawn from the same population.
b. Other computation, as finding measures of variability, is to follow.
c. The distribution is symmetrical about the center, particularly when it is
approximately normal.
d. We wish to know the “center of gravity” of a sample.
2. COMPUTE THE MEDIAN WHEN:
a. There is no sufficient time to compute a mean.
b. Distribution is badly skewed.
c. We are interested in whether cases fall within the upper or lower halves of
the distribution and not particularly in how far from the central point.
d. Incomplete distribution is given.
3. COMPUTE THE MODE WHEN:
a. The quickest estimate of central value is wanted.
b. A rough estimate of central value will go do.
c. We wish to know what is the most typical.

THE MEAN BY THE MIDPOINT METHOD

When the scores are many and are grouped into frequency distribution,
the mean may be computed by suing the midpoint method, the formula of which
is:

∑ fx
M =
N

Where f =frequency (number of scores) of each interval


X = midpoint of each interval
∑ = sum of

49
Procedures in calculating the mean by the midpoint method:

1. Lay off the frequency distribution, showing the class or step intervals
(column 1) and their corresponding frequencies (column 2). Add the
frequencies to get the number of cases (N).

2. Determine the midpoint (x) of every interval and enter it in column 3. The
midpoint is equal to the lower limit plus one-half of the interval.

Upper Limit - Lower Limit


Midpoint = Lower
+
Limit i

3. Multiply the frequencies (f) by their midpoint values (x) and enter the
products in column 4.

4. Add the products (fx) of all the steps to get their sum (∑fx).

5. Divide the sum (∑fx) by the total number of cases (N) to obtain the mean.

EDUCATION 602 – Statistics

Name Course Date

The Mean by the Midpoint Method


Exercise No.8 b

Scores F X fx

50
195-199 2 197 394

190-194 1 192 192

185-189 3 187 561

180-184 3 182 546

175-179 4 177 708

170-174 7 172 1204

165-169 10 167 1670

160-164 8 162 1296

155-159 6 157 942

150-154 3 152 456

145-149 2 147 294

140-144 2 142 284

135-139 2 137 274

N = 52 ∑fx = 8684

∑fx 8684
M = = = 167
N 52

51
EDUCATION 602 – Statistics

Name Course Date

The Mean by the Midpoint Method


Exercise No.8 b

Scores F X fx

57-59 2 58 116

54-56 0 55 0

51-53 3 52 156

48-50 3 49 147

45-47 6 46 276

42-44 6 43 258

39-41 14 40 560

36-38 14 37 518

33-35 14 34 476

30-32 10 31 310

27-29 7 28 196

24-26 7 25 175

52
21-23 3 22 66

18-20 5 19 95

15-17 3 16 48

12-14 3 13 39

N = 100 ∑fx = 3436

∑fx 3426
M = = = 34.36
N 100

MEAN BY THE SHORT METHOD

The short method is another method of computing the mean when the
scores are many (more than 30) and these are grouped into a frequency
distribution. This less cumbersome than the midpoint method because in the
short method smaller figures are handled, facilitating computation.

The formula used in calculating the mean by the short method is:

∑fd

M = AM + i
N

Where AM = assumed Mean in the midpoint of the step

chosen as Origin

f = frequency of every interval

53
d = deviation upward or downward from AM

i = size of the class interval

Procedures in calculating the mean by short method:

1. Lay off the frequency distribution showing the class or step intervals
(column 1) and their corresponding frequencies (column 2). Add the
frequencies to get the total number of cases (N).

2. Take the midpoint of any step as an assumed mean (AM). From this lay
off positive deviations upward and negative deviations downward
(column 3).

3. Multiply the frequencies by their respective deviation, keeping the


algebraic signs, fd (column 4).

4. Find the sum of positive and negative products ( fd ) algebraically.

5. Divide the sum (∑fd) by N and multiply the quotient by the size of the class
interval (i) to get the correction.

6. Add the correction to get the assumed mean to obtain the true mean.

EDUCATION 602 – Statistics

Name Course Date

The Mean by the Short Method


Exercise No.8 c

Scores F D fd

195-199 2 6 12

190-194 1 5 5

54
185-189 3 4 12

180-184 3 3 9

175-179 4 2 8

170-174 7 1 7

165-169 10 0 AM = 167

160-164 8 -1 -8

155-159 6 -2 -12

150-154 3 -3 -9

145-149 2 -4 -8

140-144 2 -5 -10

135-139 1 -6 -12

N = 52 ∑fd = 0

∑fd
M = AM + I
N

0 5
= 167 +
52

= 167 + 0

= 167

55
EDUCATION 602 – Statistics

Name Course Date

The Mean by the Short Method


Exercise No.8 c

Scores F D fd

57-59 2 8 16

54-56 0 7 0

51-53 3 6 18

48-50 3 5 15

45-47 6 4 24

42-44 6 3 18

39-41 14 2 28

36-38 14 1 14 133

33-35 14 0 AM = 34

30-32 10 -1 -10

27-29 7 -2 -14

56
24-26 7 -3 -21

21-23 3 -4 -12

18-20 5 -5 -25

15-17 3 -6 -18

12-14 3 -7 -21 -121

N = 100 ∑fd = 12

∑fd
M = AM + i
N

Where:

12
= 34 + 3
100 AM = assumed mean

∑ = sum of

= 34 + 0.12 N = no. of cases

i = size of class

= 0.36

THE MEDIAN

The median as that point in the scale above and below which lie 50% of
the cases. It is a point-measure, dividing a group into two equal sub-groups.
Hence, if the group is to be sectioned in two based on achievement or ability, the
point of division is the median.

The median is an inspection measure. It is easily determined. If there are


few cases, the median is the middlemost score of the series of scores arranged
in order of size. If there are many cases, the median is computed by
interpolation. The ease with which the median is computed accounts for its
popularity and wide use by elementary school teachers.

57
The median is the most stable measure of central tendency. It is not much
affected by extreme low or high scores. Hence, if there are low or high scores
and it is desired that these scores do not affect the average disproportionately,
the median used. Again, if there are relatively few cases, the median is
computed.

The value of the median depends on the number of scores, and not on
the magnitude of the scores. If most of the scores are high, median is high; if the
scores are low, the median is low.

Calculating of the median when data are ungrouped:

When ungrouped scores or other measures are arranged in order of size,


the median is the midpoint in the series. Two situations arise the median
computation of the median from the ungrouped data:

a. When N is odd: Arrange the score from the highest to the lowest.
Or vice versa. The middlemost score is the counting median or
midscore.

b. When N is even: Arrange the scores from the highest to the lowest,
or vice versa. The average of the two middlemost scores is the
median.

EDUCATION 602 – Statistics

Name Course Date

The Median of Ungrouped Scores


Exercise No.9 a

Counting Median = Ctgn. Mdn.

GIVEN RAW SCORES:

58
Ctgn. Mdn.
198 173 167 158

197 173 166 157

193 173 165 157

189 172 165 156

188 172 164 156

186 171 163 153

183 171 163 152

182 169 162 151

180 168 162 148

179 168 162 147

178 167 161 143

177 167 161 142

59
176 167 Ctgn. Mdn.
159 136

PROCEDURE :

Arrange the scores from the highest to the lowest and get the

middle most scores for odd numbers; take the average of the

two middle most scores for even numbers.

EDUCATION 602 – Statistics

Name Course Date

The Median of Ungrouped Scores


Exercise No.9 a

Counting Median = Ctgn. Mdn.

GIVEN RAW SCORES:

59 40 35 29

57 40 34 27

52 40 34 27

51 40 34 27

51 40 34 26

50 40 34 26

49 40 34 26

60
48 39 34 25

46 39 34 25

46 38 33 24

46 38 33 24

45 38 33 23

45 38 32 22

45 38 32 21

44 38 32 20

43 38 32 20

43 37 31 19

43 37 31 18

42 37 31 18

42 37 31 16

41 36 30 16

41 36 30 15

41 36 29 14

41 35 29 14

41 35 29 12

Ctgn. Mdn.

35

35 2 70

6
+
35 10

10

70 x

61
EDUCATION 602 – Statistics

Name Course Date

The Median by Interpolation


Exercise No.9 b

Scores Freq. Cumulative Freq.

195-199 2

190-194 1 N 52
= = 26
185-189 3 2 2

180-184 3

175-179 4 f = 22

170-174 7

165-169 10 Fm fm = 10

160-164 8 22 F

155-159 6 14 l = 164.5

150-154 3 8

145-149 2 5 i = 5

140-144 2 3

135-139 1 1 26-22
Mdn. = 164.5+ 5
N = 52 10

= 164.5+ 4 5

62
10

Formula: 20
= 164.5+
N 0
- F
2
l +
i = 164.5 + 2

fm

= 166.5

Where:

L = exact lower limit where the Mdn lies

N/2 = ½ of the cases

F = partial sum which approaches to or

equal to but not exceeding N/2

fm = frequency where median lies

I = size of the class interval

EDUCATION 602 – Statistics

Name Course Date

The Median by Interpolation


Exercise No.9 b

63
Scores Freq. Cumulative Freq.

57-59 2

54-56 0 N 52
= = 26
51-53 3 2 2

48-50 3

45-47 6 f = 22

42-44 6

39-41 14 fm = 10

36-38 14

33-35 14 fm l = 164.5

30-32 10 38 F

27-29 7 29 i = 5

24-26 7 21

21-23 3 14 12
Mdn. = 32.5+ 3
18-20 5 11 14

15-19 3 8

12-14 3 3 12
= 32.5 + 3
N= 100 14

Formula: 36
= 32.5 +
N 14
- F
2
l +
i = 32.5 + 2.57

fm

64
= 35.07

Where:

L = exact lower limit where the Mdn lies

N/2 = ½ of the cases

F = partial sum which approaches to or

equal to but not exceeding N/2

fm = frequency where median lies

I = size of the class interval

THE MODE

The mode is the most frequent score in a series or occurs the greatest
number in a grouped distribution. It is otherwise known as “commercial average”
or typical value, as in mode or fashion in dresses worm by the “average” woman.

Procedure in calculating the mode:

1. In a simple ungrouped series of measures the “rude”or “rough empirical”


mode is that single measure or score which occurs most frequent. In case
there are two most frequent scores, both are regarded as rough modes,
and the group is considered “bimodal” or “polymodal”.

2. When the data are ungrouped, the “crude” mode is the midpoint of the
stop with the greatest frequency.

65
3. The formal for approximating the true mode, when the frequency
distribution is symmetrical, or at least not badly skewed is:

Mode = 3 Mdn - 2 Mean

4. This mode is also called the refined or theoretical mode or “approximated”


Pearson mode (Karl Pearson contributed the formula: Mode = M – (M-
Md) 3, which read exactly like the former). If the mean and median are
equal, the mode is equal to either, if the mean greater than the median,
the mode is lowest; if the mean is lower than the median, the mode is the
highest.

EDUCATION 602 – Statistics

Name Course Date

The Mode
Exercise No.10

A. Crude Mode:
1. Ungrouped Scores:
34 appears the greatest number of times

2. Grouped Scores:
The step(s) has the greatest number of tallies.
165-169 = 10

160-164 = 8

155-159 = 6

3. Frequency Distribution:

66
What are the midpoints of steps having the greatest frequency? What kind of
distribution?
39-41 = 40

36-38 = 37

33-35 = 34

B. Refined Theoretical/ approximated True Mode:

1. Person
Mo = M - ( M - Mdn ) 3
= 167 – ( 167 – 165.5 ) 3
= 167 - 1.5
= 165.5

2. Simplified Garret Mode:

Mo = ( 3 x Mdn ) – ( 2 x M )
= (3 x 166.5) – ( 2 x 167 )
= 499.5 - 334
= 165.5

Note (Relationships) :

1. If M is greater than the median, the mode is lowest.


2. If M is smaller than the median, mode is the highest.
3. If M is equal to the median, mode is equal.
4. EDUCATION 602 – Statistics
5.

Name Course Date

The Mode
Exercise No.10

A. Crude Mode:

1. Ungrouped Scores:
34 appears the greatest number of times

67
2. Grouped Scores:
The step(s) has the greatest number of tallies.
39-41 = 14

36-40 = 14

33-35 = 14

3. Frequency Distribution:
What are the midpoints of steps having the greatest frequency? What kind of
distribution?
39-41 = 40

36-40 = 37

33-35 = 34

B. Refined Theoretical/ Approximated True Mode/ Person and Garret Mode:

1. Person

Mo = M - ( M - Mdn ) 3
= 34.36 – ( 34.36 – 35.07 ) 3
= 34.36 - 2.13
= 36.49

2. Simplified Garret Mode:

Mo = ( 3 x Mdn ) – ( 2 x M )
= (3 x 35.07) – ( 2 x 34.46 )
= 103.08 - 68.72
= 36.49

MEASURES OF VARIABILITY

Ordinarily, after calculating the measures of central tendency, the next


step is to find some measures of the variability of our scores, that is, of the
“scatter” or “dispersion” of the separate scores around the central tendency.
Four measures have been devised to indicate the variability within a set of
measure: the range, the quartile deviation (Q), the average deviation (AD), and
the standard deviation (AD or S).

68
Calculating the range:
The range is the interval between the highest and the lowest scores. It is
the most general measure of spread or scatter, and is computed when we wish
to make a rough comparison of two or more groups of variability. The range
takes account of the extremes of the series scores only and is unreliable when N
is small, or when there are large gaps (i.e. zero f’s) in the frequency distribution
the range to the equal to the difference between the midpoint of the highest step
and the midpoint of the lowest step.
Computing the Quartile Deviation (Q):
The quartile deviation or Q is one-half the scale distance between the 75 th
and the 25th percentiles or Q1 is the first quartile on the score scale, the point
below which lie 25% of the score. The 75th percentile or Q3 is the third quartile on
the score scale, the point below which lie 75% of the scores. When we have
these two points the Q is found from the formula.

Q3 - Q1
Q =
2

To find Q1 it is clear that we must first compute the 75 th and 25th


percentiles. These statistics are found in exactly the same way as was median,
which is, of course, the 50th percentile or Q3. The only difference is that ½ of N is
counted off from the low end of the distribution to find Q 1 and that 3N is counted
off to find Q3. The formulas are:

3
N
- F N - F
Q1 = l + i and Q3 = l + i
4 4

Fq fq

Where:
l = the exact lower limit of the interval in which the quartile falls

69
F= the cumulative sum of all frequencies from the lowest step, which sum
approaches or equal to (but not exceeds) N or Q 1/4 or 3N for Q3/4

fq= the frequency on the interval containing the quartile


The quartile Q1 and Q3 mark off the limits of the middle 50% of the scores
in the distribution, and the distance between these distance two points is called
the interquartile range. Q is ½ the range of the middle 50% or the semi-
interquartile range. Since Q measures the average distance of the quartile points
from the median, it is a good index of scores in the distribution are packed
closely together, the quartiles will be near one another and Q will be small. If the
score are widely scattered, the quartile will be relatively far apart. And Q will be
large.
The quartile deviation is used with the median. If the distribution is
issued to be normal, the median plus the quartile deviation gives the upper
quartile (Mdn + Q = Q3); the median minus the quartile deviation gives the lower
quartile (Mdn – Q = Q1). In a normal distribution, Q is called the probable error or
PE (of the normal probability curve).
The quartile deviation may be used in sectioning or classifying pupils in a
group or in the distribution of grades. For instance, from Mdn = Q, section B r
grade B or 2; from Mdn – Q to Mdn, section C or grade of C or 3; from the loest
score to Mdn – Q, section D or grade of D or 4. With this there will be four
sections with practically equal number of pupils in each.
The quartile deviation indicates the homogeneity or heterogeneity
of the group. The smaller the Q, the more homogeneous is the group; the greater
the Q, the more heterogeneous is the group.

EDUCATION 602 - Statistics

70
Name Course Date

The Quartile Deviation


Exercise No.11

Scores Freq. Q1 Q2 Formulas:

195-199 2

190-194 1 Q3 – Q1
Q =
185-189 3 2

180-184 3

175-179 174.5 4 fq

170-174 7 39 F N
- F
165-169 10 32 4 i
Q1 = l+
160-164 8 22 fq

155-159 154.5 6 fq 14

150-154 3 8 F 8

145-149 2 5 5 3N
- F
140-144 2 3 3 Q3 = l+ 4 i

135-139 1 1 1 fq

N= 52

N 3N
- F - F
4 i 4 i
Q1 = l + Q3 = l+
fq fq

71
13 – 8 39 - 39
Q1 = 154.5 + 5 Q3 = 174.5+ 3
6 4

Q1 = 154.5 + 25/6 Q3 = 174.5 + 0/4

Q1 = 154.5 + 4.166 Q3 = 174.5 + 0

Q1 = 158.67 Q3 = 174.5

Note:

Q3 - Q1
Q =
2 If Q is 10 or more, the group is

heterogeneous

174.5 – 158.67 If Q is less than 10, the group


Q =
2 is homogeneous

Q = 7.91 Homogeneous

EDUCATION 602 - Statistics

Name Course Date

The Quartile Deviation


Exercise No.11

Scores Freq. Q1 Q2 Formulas:

57-59 2

54-56 0 Q3 – Q1
Q =
51-53 3 2

72
48-50 3

45-47 6

42-44 6 N

39-41 14 f 4 - F
i
q Q1 = l+
36-38 14 66 F fq

33-35 14 52

30-32 10 38

27-29 7 f 28 3N
q - F
i
24-26 7 21 F 21 Q1 = l+ 4

21-23 3 14 14 fq

18-20 5 11 11

15-17 3 6 6

12-14 3 3 3

N = 100

N 3N
- F - F
4 i 4 i
Q1 = l + Q3 = l+
Fq fq

25 – 21 75 – 66
Q1 = 26.5 + 3 Q3 = 38.5 + 3
7 14

Q1 = 26.5 + 1.714 Q3 = 38.5 + 1.929

Q1 = 28.214 Q3 = 40.429

73
Q3 - Q1 Note:
Q =
2 If Q is 10 or more, the group is

heterogeneous

40.429-28.214 If Q is less than 10, the group


Q =
2 is homogeneous

Q = 6.108

Q = 6.11 Homogeneous

THE AVERAGE DEVIATION

The average deviation or AD (also written as mean deviation or MD) is


the mean of the deviation of all the separate scores in series taken from their
mean (occasionally from the median or mode). It includes the middle of the
cases. It is larger than the quartile deviation.
The average deviation is affected by every score; hence if it is desired to
have every score a weight on the measure of variability; the average deviation
should be used.
The AD, like other measure of dispersion, is used in determining the
extent of difference of variability among the members of a group. The higher the
AD, the more variable of heterogeneous is the group; the smaller the AD, the
more compact or homogeneous is the group. From this it may be seen that the
AD can be used in classifying pupils.
The long method in computing the average deviation is used when there
are few cases, not more than 30. The short method is used when there are many
cases grouped with intervals. The former method is simple but laborious, the
latter is more complicated but requires shorter time if there are many cases
Computation of the AD from the ungrouped scores:
The average deviations to find the AD, no account is taken of signs, and
all deviations whether plus or minus are treated as positive. The formula for the
AD scores of ungrouped is:

74
∑ /X/
AD =
N

In which the bars / / enclosing the x indicates that signs are disregarded in
arriving the sum. As 1 as always, x is a deviation of the scores from the mean,
i.e. m X – M = x.
1. Find the arithmetic mean by the long method.

2. Subtract the mean from every score to get the deviation (X).

3. Add the deviations arithmetically, i.e., regardless of the positive and


negative signs.

4. Divide the sum of the deviations (∑/X/) by the number of cases (N).

Calculating the AD from the Grouped Data:


The AD is rarely used in modern statistics, but it is often found in the older
experimental literature. Should the student find it necessary to compute the AD
from the grouped data, the formula is:

∑/X/ fx = the product of the deviation by their


AD =
N frequencies

1. Compute the arithmetic mean by the short method.

2. Subtract the mean from the midpoint of every step to find the deviation
(x= Midpoint – Mean).

3. Multiply each deviation by the corresponding frequency to obtain fx.

4. Add the product (fx) arithmetically to get their sum (∑ /fx/).

5. Divide the arithmetic sum of the product (∑ /fx/) by the number of cases (N). The
quotient is the AD.

75
EDUCATION 602 - Statistics

Name Course Date

The Average Deviation


Exercise No.12

Given in the previous exercise = 167

Scores Frequencies Midpoints X Fx

195-199 2 197 30 60

190-194 1 192 25 25

185-189 3 187 20 60

180-184 3 182 15 45

76
175-179 4 177 10 40

170-174 7 172 5 35

165-169 10 167 0 0

160-164 8 162 -5 40

155-159 6 157 -10 60

150-154 3 152 -15 45

145-149 2 147 -20 40

140-144 2 142 -25 50

135-139 1 137 -30 30

N = 52 /fx/ = 530

ul – ll
Midpt. = l +
2 ∑ /fx/
AD =
N

199-
= 195 + 195

2 530
=
52

4
= 195 +
2 = 10.193

= 195 + 2 = 10.19

= 197

77
If AD is 12 or more, the group is heterogeneous.

If AD is less than 12, the group is homogeneous.

EDUCATION 602 - Statistics

Name Course Date

The Average Deviation


Exercise No.12

Given in the previous exercise = 34.36

Scores Frequencies Midpoints X Fx

57-59 2 58 23.64 47.28

54-56 0 55 20.64 0

51-53 3 52 17.64 52.92

48-50 3 49 14.64 43.92

45-47 6 46 11.64 69.84

42-44 6 43 8.64 51.84

39-41 14 40 5.64 78.96

36-38 14 37 2.64 36.96

33-35 14 34 -0.36 5.04

30-32 10 31 -3.36 33.6

27-29 7 28 -6.36 44.52

24-26 7 25 -9.36 65.52

78
21-23 3 22 -12.36 37.08

18-20 5 19 -15.36 76.8

15-17 3 16 -18.36 55.08

12-14 3 13 -21.36 64.08

N = 52 /fx/ = 763.44

ul – ll
Midpt. = l +
2 ∑ /fx/
AD =
N

59 - 57
= 57 +
2 763.44
=
100

2
= 57 +
2 = 76.344

= 57 + 1 = 76.63

Homogeneous

= 58

If AD is 12 or more, the group is heterogeneous.

If AD is less than 12, the group is homogeneous.

THE STANDARD DEVIATION

The standard deviation or SD is the most stable index of variability and


is customarily employed in experimental work and in research studies. The SD
differs from the AD in several respects.in computing the AD, we disregard signs
and treat all deviations as positive, whereas finding the SD we avoid the difficulty

79
of signs by squaring the separate deviations. The squared deviation used in
computing the SD is always taken from the mean, never from the median or
mode. The conventional symbol for SD is the Greek letter sigma ( o ).

The SD is less affected by sampling errors than the Q or the AD. In a


normal, as well as in a less symmetrical, distribution the SD marks the limits of
the middle 68.26% (roughly the middle two-thirds) of the distribution. It is
therefore, larger than the AD which is, in turn, larger than Q. These relationships
supply a rough check upon the accuracy of the measures of variability.

With the sigma or ASD, it is possible to compute other measures: AD=


0.7979 ; Q = 0.745 ; Pedist = 0.6745 ; (coefficient of variability) = 100 /M.

In the distribution of grades in the transmutation of raw scores, the SD is


used with the arithmetic mean. To get the limits, the SD is multiplied by 0.5 and
by 1.5 and the result is added to, and subtracted from the mean.

LIMITS GRADE or RATINGS

M + 1.5 SD to highest scores A or 1

M + 0.5 SD to M + 1.5 SD B or 2

M – 0.5 SD to M + 0.5 SD C or 3

M – 1.5 SD to M – 0.5 SD D or 4

Lowest score to M – 1.5 SD E or 5

The SD is also used in the comparison of groups. The higher the SD, the
more heterogeneous is the group. The smaller is the SD, the more homogeneous
is the group.

EDUCATION 602 - Statistics

80
Name Course Date

The Standard Deviation by the Long Method


Exercise No.13a

Formula :

∑ fd2
SD =
N

Where:

D = Deviation of every midpoint from the mean

d = Midpt. – M

fd2 = a) fd2 * d 13.373

b) fd * f20 * 1 = 20 178.846153

+ 3 -1

23 78

13 * 20 = 260 - 69

+3 984

263 - 789

133 * 20 = 2660 19561

+ 7 - 18669

2667 89253

- 1330 - 80229

1337 9024

* 20

81
26740 Re-check:

+ 3

26743 13.373

x 13.373

40109

93611

40119

40119

+ 13373

178837129

+ 9024

178.846153

EDUCATION 602 - Statistics

Name Course Date

The Standard Deviation by the Long Method


Exercise No.13a

Scores Freq. Midpt. D fd fd2

195-199 2 197 30 60 1800

190-194 1 192 25 25 625

82
185-189 3 187 20 60 1200

180-184 3 182 15 45 675

175-179 4 177 10 40 400

170-174 7 172 5 35 175

165-169 10 167 0 0 0

160-164 8 162 -5 40 200

155-159 6 157 -10 60 600

150-154 3 152 -15 45 675

145-149 2 147 -20 40 800

140-144 2 142 -25 50 1250

135-139 1 137 -30 30 900

∑ fd2 = 9300

∑ fd2
SD =
N

9330
=
52

= 178.846153

= 13.37

83
Interpretation :

If SD is less than 15, the group is homogeneous.

If SD is 15 or more, the group is heterogeneous.

EDUCATION 602 - Statistics

Name Course Date

The Standard Deviation by the Long Method


Exercise No.13a

Scores Freq. Midpt. D fd fd2

57-59 2 58 23.64 47.28 1117.699

54-56 0 55 20.64 0 0

51-53 3 52 17.64 52.92 933.5088

48-50 3 49 14.64 43.92 642.9888

45-47 6 46 11.64 69.84 812.9376

42-44 6 43 8.64 51.84 447.8976

39-41 14 40 5.64 78.96 445.3344

36-38 14 37 2.64 36.96 97.5744

33-35 14 34 -0.36 5.04 1.8144

30-32 10 31 -3.36 33.6 112.896

84
27-29 7 28 -6.36 44.52 283.1472

24-26 7 25 -9.36 65.52 613.2672

21-23 3 22 -12.36 37.08 458.3088

18-20 5 19 -15.36 76.8 1179.648

15-17 3 16 -18.36 55.08 1011.269

12-14 3 13 -21.36 64.08 1368.749

∑ fd2 = 9527.04

∑ fd2
SD =
N

9330
=
52

= 178.846153

= 13.37

Interpretation :

If SD is less than 15, the group is homogeneous.

If SD is 15 or more, the group is heterogeneous.

85
Calculating the SD from the Ungrouped Scores:

The formula is ∑x2


SD = X2 = squared deviations
N

1. Find the mean.


2. Subtract the arithmetic mean algebraically from every score to get the
deviation (x). When the score is numerically greater than the mean, the x
will be plus; when numerically less than the mean, the x will be minus.
3. Square each deviation (x2).
4. Add the square deviations to get their sum (∑x2).
5. Divide the sum (∑x2) by the number of cases (N) to get the mean of the
deviation squared.
6. Extract the root of the mean of the deviations squared. The result is the
SD or sigma.

Calculating of the Grouped Data:

(a) By the long Method:

The process is identical with used for ungrouped items


except that, in addition to squaring the deviation (represented by x
or d) of each midpoint from the mean, we weight each of these
squared deviations by the frequency which it represents – that is,
by the frequency opposite it.

∑ fd2
SD = fd2 = d x fd
N

(b) By the Short Method:

The short method used in calculating the mean consisted


essentially in “guessing” or assuming the mean, and later applying
a correction to give the actual mean. It is a decided time or labor
saver in dealing with grouped data, and is well-high indispensable
in the calculation of 0’s in a correlation table.

SD = ∑ fd2 - ∑ fd 2

86
N N

1. Determine the point of origin (0) as in the computation of the


mean by the short method.

2. From this lay off (d), positive deviation (1, 2, 3, etc.) upward and
negative deviations (-1, -2, -3, etc.) downward.

3. Multiply each deviation (d) by the frequency (f) column.

4. Multiply each deviation (d) by the frequency times deviation (fd)


to obtain fd2 (column 5). This is the same in squaring each
deviation and multiplying it by its corresponding frequency.

5. Find the algebraic sum of fd and fd 2 to get ∑fd and ∑fd2


respectively.

6. Apply the formula.

(c) By the percentile Method:

The SD may be estimated fairly accurately by means of the


formula:

= 0.4 x d or = 0.4 ( P90 - P10 )

1. Obtain the 90th percentile.


2. Obtain the 10th percentile.
3. Substitute in the formula.

Though Q is used much more frequently than D, the latter is a far


better percentile measure of variability and is considerably easier to
compute.

87
EDUCATION 602 - Statistics

Name Course Date

The Average Deviation by the Short Method


Exercise No.13b

Scores Freq. Midpt. D fd fd2

195-199 2 197 30 60 1800

190-194 1 192 25 25 625

185-189 3 187 20 60 1200

180-184 3 182 15 45 675

175-179 4 177 10 40 400

170-174 7 172 5 35 175

165-169 10 167 0 0 0

160-164 8 162 -5 40 200

155-159 6 157 -10 60 600

150-154 3 152 -15 45 675

145-149 2 147 -20 40 800

140-144 2 142 -25 50 1250

135-139 1 137 -30 30 900

∑fd= 530 ∑ fd2 = 9300

88
∑ fd2 ∑fd 2

SD = -
N N

9300 530 2

= -
52 52

= 74.96302

= 8.66 Homogeneous

Interpretation :

If SD is less than 15, the group is homogeneous.

If SD is 15 or more, the group is heterogeneous.

EDUCATION 602 - Statistics

Name Course Date

The Average Deviation by the Short Method


Exercise No.13b

89
Scores Freq. Midpt. D fd fd2

57-59 2 58 23.64 47.28 1117.699

54-56 0 55 20.64 0 0

51-53 3 52 17.64 52.92 933.5088

48-50 3 49 14.64 43.92 642.9888

45-47 6 46 11.64 69.84 812.9376

42-44 6 43 8.64 51.84 447.8976

39-41 14 40 5.64 78.96 445.3344

36-38 14 37 2.64 36.96 97.5744

33-35 14 34 -0.36 5.04 1.8144

30-32 10 31 -3.36 33.6 112.896

27-29 7 28 -6.36 44.52 283.1472

24-26 7 25 -9.36 65.52 613.2672

21-23 3 22 -12.36 37.08 458.3088

18-20 5 19 -15.36 76.8 1179.648

15-17 3 16 -18.36 55.08 1011.269

12-14 3 13 -21.36 64.08 1368.749

∑fd =763.44 ∑ fd2 = 9527.04

∑ fd2 ∑fd 2

SD = -
N N

= 9527.04 - 763.44 2

90
100 100

= 95.2704 – 58.28406

= 6.08 Homogeneous

Interpretation :

If SD is less than 15, the group is homogeneous.

If SD is 15 or more, the group is heterogeneous.

THE CUMULATIVE FREQUENCY GRAPH

The cumulative frequency graph is another way of representing a


frequency distribution by means of a diagram. Before we can plot a cumulative
frequency graph, the scores of the distribution must be added serially or
cumulated. The height of the graph indicates the total number of frequencies.

CONSTRUCTION OF THE COMULATIVE FREQUENCY GRAPH:

1. Find the cumulative frequency of every step beginning with the lowest step
by adding the f’s cumulatively upward. (The last cumulative f is equal to
N).

2. Draw OX and OY, as the in the frequency polygon or histogram.

3. Lay off the intervals or steps on OX and mark on OY successive unit


distances to represent the cumulative frequencies on different steps. (Use
the exact limits of the intervals).

91
4. Read off the steps, together with the corresponding cumulative
frequencies and place points through the exact upper limits of the steps.

5. Connect the successive points by straight lines, and at the lower end drop
a line to the exact lower limit of the lowest step (also the exact upper limit
of the step next below the lowest, the f which is O).

Name Course Date

92
93
CALCULATION OF PERCENTILES IN A FREQUENCY
DISTRIBUTION

We have learned that the median is that point in a frequency distribution


below which lie 50% of the measures or scores; and that Q1 and Q3 mark points
in the distribution below which lie, respectively, 25%and 75% of the measures or
scores. Using the same method by which the median and the quartiles were
found, we may compute points below which lie 10%, 43%, 85%, or any percent
of scores. These points are called percentiles, and are designated, in general, by
the symbols Pp, the referring to the percentage of cases below the given value.
P10 for example, is the point below which lie 10% of the scores. It is evident that
the median, expressed as a percentile is 50%; also Q1 is P25 is Q3 is P75

The method of calculating percentiles is essentially the same as that


employed in finding the median. The formula is

Pp = l + PN-F i
fp

Where:

Pp = percentage of the distribution wanted, e.g., 10%, 33%, etc.

l = exact lower limit of the class interval upon which Pp lies.

P = part of N to be counted off in order to reach Pp.

F = sum of all scores upon all intervals below l.

fp = number of scores (f) within the interval upon which Pp falls.

I = length or size of the class interval.

94
Procedure:

1. Multiply N by the percentile desired.

2. Add cumulatively upwards the frequencies to get the partial sum (F


which should approach or equal, but not exceed the percentile sum
(PN).

3. Subtract the partial sum (F) from the corresponding percentile sum (P N)
and divide the difference by the frequency (fp) of the step containing
the percentile desired, multiply the quotient by the size of class
intervals (i) to get the correction.

4. Add the correction to the exact lower limit (l) to the step containing the
percentile desired. The result is the percentile.

P0 and P100 mark the exact lower limit of the first interval the exact upper
limit of the last interval, respectively. These two percentiles represent limiting
points. Their principal value is to indicate the boundaries of the percentile scale.

Percentiles are used in the transmutation of raw scores. A pupil whose


score is equal to P20 surpasses 20% of the group and is surpassed by 80%
percentiles, therefore, indicates the percentage of pupils surpassed by a pupil
and the percentage of pupils that surpass him.

Percentiles can be used as grades, and are more reliable and comparable
than grades, letters, or numbers. Under a strict or a lenient teacher, percentiles
which are numerically equal have the same meaning. A grade P35 given by a
strict teacher is equal to a grade of P35 given by a lenient teacher.

95
Name Course Date

96
97
CALCULATING OF PERCENTILE RANKS IN A FREQUENCY
DISTRIBUTION

Percentiles are point in a continuous distribution below which lie given


percentage of N. Percentile rank is the position on a scale of 100 to which the
f
subjects score entitles him. In calculating percentiles we start with a certain
i
percent of N, say 15%. We then count into the distribution the given percent and
the point reach is the required percentile ranks is the reverse of this process.
Here we begin with and individual score, and determine the percentage of scores
which lies below it. If this percentage is 62, for example, the score has a
percentile rank or PR of 62 on a scale of 100.

Formula : Pp = x (score – l ) + cum. f


X 100

N
Where:

f = frequency of the interval where the score falls

i = size of the class interval

l = exact lower limit of the interval where the score falls

cum f. = number of scores (cumulative frequency below l)

N = total number of cases

Procedure in Calculating PR:

1. Determine the class interval or step where a given score falls.

2. Determine the frequency (f) on this interval and divide it by i.

3. Multiply the difference of the exact lower limit (l) and the given score by
the quotient obtained in step 2.

4. Add the product obtained in step 3 to the cumulative sum of the


frequencies (cum. f) below l and divide by resulting sum by the total
number of cases (N). This gives us the PR of the given scores.

98
Name Course Date

99
100
101
THE CUMULATIVE FREQUENCY CURVE OR OGIVE

The cumulative frequency curve or ogive differs from the cumulative


frequency graph in that frequencies are expressed as cumulative percents of N
on the Y axis instead of as cumulative frequencies.

Construction of the ogive:

1. Lay off a cumulative percentage distribution:

a. List the class intervals and their frequencies (columns 1 and


2)
b. Cumulate the f’s from the low end of the distribution upward
(column 3)
c. Compute the cumulative percents by dividing each sum. F
by NA better method is to determine first the reciprocal 1/N,
called the Rate and multiply each cumulative f. in order by
this fraction (column 4)

2. Plot the ogive from the data in column 4, that is the cumulative
percent f (column).

a. Draw OX and OY, as in the frequency polygon or histogram


or cumulative frequency graph
b. Lay off the exact interval limits of the distribution on OX and
mark off on OY successive units distances to represent the
cumulative percentages on the different steps.
c. Read off the steps, together with the corresponding
cumulative percentages and place points through the exact
upper limits of the steps.
d. Connect the successive points by straight lines, and at the
lower end drop a line to the exact limit of the lowest step.

Uses of the ogive:

1. Percentiles and percentile ranks may be determined quickly and fairly


accurately from the ogive. To obtain P50, the median for example draw a
line from 50 on the Y scale parallel to the X axis. This will locate the
median approximately. In order to read the percentile ranks of a given

102
score from the ogive, reverse the process in determining percentiles.
Percentiles and percentile ranks will often be slightly in error when read
from the ogive, but this can be made very small when the curve is
carefully drawn, the scale division precisely marked, and the diagram
fairly large.

2. A useful comparison of two or more groups is provided when ogives


representing their scores on a given test are plotted upon the same
coordinate axis. Differences in achievement as between the groups are
shown by the distances separating the two curves at various levels.

3. Percentile norms may be determined directly from the smoothed ogives.


(Norms are measures of achievement which represent the typical
performance of some designated group or groups). However, percentile
norms read from an ogive are not strictly accurate, but the error is slight
except at the top and bottom of the distribution. Estimates of these
extreme percentiles from smoothed ogives are probably more nearly true
values than are calculated points, since the smoothed curve represents
what we might expect to get from large groups or additional samplings.

Name Course Date

103
104
MEASURING DIVERGENCE FROM NORMALITY

105
To find the divergence of the actual distribution (represented by the
histogram) from the best fitting normal curve that has been superimposed,
both the skewness and the kurtosis should be computed.

A useful index of skewness is given by the formula

3(M −MDN ) ( P 90+ P 10 ) −50


Sk= ∨SK =
SD 2

A distribution is said to be skewed when the M and the MDN fall at


different points in the distribution, and the balance (or center of gravity) is
shifted to one side or the other – to the left or right. In a normal distribution,
the M equals the MDN exactly and the skewness is of course zero. The
more nearly the distribution approaches the normal form, the closer
together are the M and the MDN, and the less skewness. Distribution are
said to be skewed negatively (to the left) when scores are massed at the
high end of the scale (right end) and are spread out more gradually toward
the low end (left). Distributions are skewed positively (to the right) when
scores are massed at the low (left) end of the scale and are spread out
gradually toward the high or right end. Moreover, when skewnes is
negative, the M lies to the left of the MDN and when skewness is positive,
the M lies to the right of the MDN.

The term kurtosis refers to the peakedness or flatness of a


frequency distribution as compared with the normal. A frequency
distribution more peaked than normal is said to be leptokurtic; one flatter
than the normal, platykurtic. A normal curve is called mesokurtic. A formula
for measuring Kurtosis is :
Q
Ku=
P 90−P 10

For normal curve the formula gives Ku = 0.263. If Ku is greater than


0.263 the distribution is platykurtic, if less than 0.263 the distribution is
leptokurtic.

106
Name Course Date

107
108
PLOTTING THE BEST FITTING NORMAL CURVE

A normal curve of the same N, M, SD, as the actual distribution may


be superimposed on the histogram or frequency polygon of the distribution.
Such a model curve is the best fitting normal distribution. The research
worker often wished to compare his distribution by eye with that normal
curve which “best fits” the data and such comparison may profitably be
made even if no measures of divergence from normality are computed. In
fact the direction and extent of asymmetry often strike us more
convincingly when seen in a graph that when expressed by measures of
skewness and kurtosis. It may be noted that a normal curve can always be
readily constructed by following the procedures given here, provided the
area N and variability SD are known.

The scores of the actual frequency distribution should be


represented by a histogram instead of by a frequency polygon in order to
prevent coincidence of the surface outlines and to bring out more clearly
agreement and disagreement at different points. To plot a normal curve
over this histogram, we first compute the height of the maximum ordinate.
The maximum ordinate δ can be determined from the equation of the
normal curve:

y=

δ N

Where:

δ = SD of the distribution expressed in units of class intervals because the


units on the X axis are in terms of class intervals.

[ ]
SD
I

√ 2 π ∨√2(3.1416)=2.51(constant )

109
For example, if N = 52, SD = 13.37, M = 167, therefore
52
y=
2.67 x 2.51

52
=
6.7

= 7.76 or 7.8 the height of the maximum


ordinate of the best fitting normal curve data. This point is at
the middle most point of the histogram.

Note:

Round off your answer to one decimal for convenience in plotting the
normal curve.

Knowing yδ we are able to compute from table B the heights of


ordinates at given distances from the mean.

± 1 δ=0.60653 x 7.8=4.7

± 2 δ=0.13534 x 7.8=1.0

± 3 δ=0.01111 x 7.8=0.09∨0.1

The normal probability curve may be sketched in without much


difficulty through the ordinates at these seven points. Somewhat greater
accuracy will be obtained if various intermediate ordinates, for example at
± 0.5 δ 1.5 , etc. are also plotted.

110
Name Course Date

111
112
THE NORMAL PROBABILITY CURVE – ITS NATURE AND IMPORTANCE

The normal probability curve, or simply the normal curve, is a bell


shaped, symmetrical curve. Frequency distributions of data drawn from
anthropometry, psychology, meteorology and education resemble the
normal probability curve, the mean, median, mode fall at the same point,
and there is a perfect balance or symmetry between the right and left
halves of the figure.

The normal probability curve is of great importance in educational


measurement because of its usefulness in the construction of tests and
scales and in many calculations involving quantitative data. It is usually
necessary in certain problems to assume some form of distribution, and
the normal curve is taken because it gives the best single approximation to
the ordinary distribution of the scores.

113
An unsymmetrical distribution is called a skewed distribution. In
a skewed distribution, the mean, median, and the mode fall at
different points, in the distribution. Skewness is computed by the
formula:

3(M −MDN ) ( P 90+ P 10 ) −P 50


Sk= ∨SK =
SD 2

When the mean is smaller than the median, the distribution is skewed
negatively or to the left, that is the scores are massed at the high (right)
end of the scale, and the spread out gradually at the lower end. When the
mean is greater than the median, the distribution is positively skewed, or to
the right, that is the score are massed at the low left end of the scale and
are spread out gradually at the high (right) end.

114
The results of the test that is easy are negatively skewed and those of a
test that is difficult are positively skewed. The results of a test that is of moderate
easy or difficulty approach a normal curve.

There are several reasons why distributions are skewed:

a. Small size of the group measured or tested

b. Selection of a special group

c. Technical faults in the construction of the test and errors in


scoring.

Note:

The normal distribution is not actual distribution of test scores, but is,
instead a mathematical model. Frequency distribution of scores approach the
theoretical distribution as a limit, but the fit is rarely perfect.

Principle:

Measurements of many natural phenomena and of many mental and


social traits under certain conditions tend to be distributed symmetrically about
their means in proportions which approximate those of the normal probability
distribution.

115
Much evidence has accumulated to show that the normal distribution
serves to describe the frequency of occurrence of many variable facts with a
relatively high degree of accuracy. Phenomena which follow the normal
probability curve (at least approximately) may be classified as:

1. Biological statistics: Mendelian ratios – Proportion of male to female


birth for the same community over a period of years.

2. Anthropological data: height, weight, etc.

3. Social and economic data: rates of birth, marriage, or death under


certain constant conditions: wages

4. Psychological measurements: Intelligence as measured by


standard tests; speed of association, perceptions span; educational
test scores in spelling, etc.

5. Errors of observation: measures of height, speed of movement,


linear magnitude, physical and mental traits.

116
APPLICATIONS OF THE NORMAL PROBABILITY CURVE

A number of problems may readily be solved if we can assure that our


obtained distributions can be treated as normal, or as approximately normal.
Each general problem will be illustrated by several examples. Constant reference
will be made to Table A; and a knowledge of how to use this table is essential.

1. To determine the percentage of cases in a normal distribution which


falls within given limits?

Problem: Given a normal distribution with M = 20 and SD = 5

a. What percent of the cases fall between 15 and 25?

b. What percent of the cases lie above 30?

c. What percent of the cases lie below 12?

Solution:

a. Score 25 – Mean (20) = 5 and score 15 – Mean (20) = -5. Divide the
difference 5 by SD (5). The quotients are 1SD and -1SD, respectively.
Score 25 is 1SD above the mean and score 15 is 1SD below the mean.
From Table A 1SD includes 34.13% of the cases above the mean and -
1SD includes 34.13% of the cases below the mean. Add 34.13% and
34.13%. The sum, 68.26% represents the cases that fall between15 and
25.

b. Score 30 is 10 points or 2SD above the mean. From the table 47.72%
of the cases fall between the mean and 2SD. Accordingly, 2.28% (50%-
47.72%) of the cases lie above 30.

c. Score 12 is 8 points or -1.6SD from the mean. Between the mean and -
1.6SD are 44.52% of the case. Hence, 50% - 44.52 or 5.48% of the cases
lie below 12.

2. To find the limits in any normal distribution which includes a given


percentage of the cases?

Problem:

Given a distribution with M = 20 and SD = 5. Assuming normality,


what limits will include the middles 65% of the cases?

117
Solution:

The middle 65% of the cases include 32.5% above and 32.5%
below the mean. From Table 32.5% of the distribution is very close to
32.38% or .93SD. The middle 65% of the case, therefore, lies between the
mean and ± .93 SD or since SD equals 5, between the mean and ± 4.65
points adding 4.65 to the mean (20) gives 24.65 and subtracting 4.65 from
the mean gives 15.35. Therefore the middle 65% of the cases lie between
24.65 and 15.35.

3. To determine the relative difficulty of test questions, problems, and other


test items.

Problem:
Give a test question or problem solved by 10% of the group; a
second problem solved by 20% of the group and a third, solved by 40% of
the group. What is the relative difficulty of questions 1, 2, and 3?
Solution: Question 1 is passed by 10% and is failed by 90%. The highest
10% of the group has 40% of the cases between its lower limit and the
mean (50% - 10% = 40%). From the table 39.97% or 40% fall between
1.28SD and the mean. Accordingly, 1.28SD is the difficulty value of
question 1.
Following the same procedure, question 2; passed by 20% of the group,
falls at a point in the distribution 30% above the mean (50% - 20% = 30%).
From the table 9.87% of 10% of the group falls between the mean
and .25SD. Therefore question 3 has a difficulty value of .25SD.

The SD gives the real index of difficulty of test questions, and not
the percent of passing or failing.

4. To separate a given group into subgroups according to capacity, when


the trait is normally distributed.

Table A. Fractional parts of the total area under the normal


probability curve, corresponding to distance on the baseline
between the mean and successive points laid off from the mean in
units of standard deviation.

Example: Between the mean and a point 1.380SD = 1.38 are


found 41.62% of the entire area under the curve.

118
X .00 .01 .02 .03 .04 .05 .06 .07 .08 .09

0.0 0000 0040 0080 0120 0160 0199 0239 0279 0319 0359
0.1 0398 0438 0478 0517 0557 0596 0636 0675 0714 0753
0.2 0793 0832 0871 0910 0948 0987 1026 1064 1103 1141
0.3 1179 1217 1255 1293 1331 1368 1406 1443 1480 1517
0.4 1554 1591 1628 1664 1700 1736 1772 1808 1844 1879
0.5 1915 1950 1985 2019 2054 2088 2123 2157 2190 2224
0.6 2257 2291 2324 2357 2339 2422 2454 2485 2517 2549
0.7 2580 2611 2642 2673 2704 2734 2764 2794 2823 2852
0.8 2881 2910 2939 2967 2995 3023 3051 3075 3106 3133
0.9 3159 3186 3212 3238 3264 3290 3315 3340 3365 3380
1.0 3413 3438 3461 3485 3508 3531 3554 3577 3599 3621
1.1 3643 3665 3686 3708 3729 3749 3770 3790 3810 3830
1.2 3849 3869 3888 3907 3925 3944 3962 3980 3997 4015
1.3 4032 4049 4066 4082 4099 4115 4131 4147 4162 4177
1.4 4192 4107 4222 4236 4251 4265 4279 4292 4306 4319
1.5 4332 4345 4357 4370 4383 4394 4406 4418 4429 4441
1.6 4452 443 4474 4484 4495 505 4515 4525 4535 4545
1.7 4554 4564 4573 4582 4591 4599 4608 4616 4625 4633
1.8 4641 4649 4656 4664 4671 4678 4686 4693 4699 4606
1.9 4713 4719 4726 4732 4738 4744 4750 4756 4761 4767
2.0 4772 4778 4783 4788 4793 4798 4809 4808 4812 4817
2.1 4821 4826 4830 4834 4838 4842 4846 4850 4854 4857
2.2 4861 4864 4868 4871 4875 1878 4881 4884 4887 4890
2.3 4893 4896 4898 4901 4904 4906 4909 4911 4913 4916
2.4 4918 4920 4922 4925 4927 4929 4931 4932 4934 4936
2.5 4938 4940 4941 4943 4945 4946 4948 4949 4951 4952
2.6 4918 4920 4922 4925 4927 4929 4931 4932 4934 4936
2.7 4938 4940 4941 4943 4945 4946 4948 4949 4951 4952
2.8 4974 4975 4976 4977 4977 4978 4979 4979 4980 4981
2.9 4981 4982 4982 4983 4984 4984 4985 4985 4986 4986
3.0 4986.5 4996.9 4987.4 4987.8 4988.2 4988.6 4988.9 4989.3 4989.7 4999
3.1 4990.3 4990.6 4991.0 4991.3 4991.6 4991.8 4992.1 4992.4 4992.6 4992
3.2 4993.129
3.3 4996.166
3.4 4996.631
3.5 4997.674
3.6 4998.409
3.7 4998.922
3.8 4999.277
3.9 4999.519
4.0 4999.683
4.5 4999.966
5.0 4999.997133

119
Name Course Date

120
121
122
Name Course Date

123
124
125
COEFFICIENT OF CORRELATION

Correlation is the relationship between two or more series of measures of


the same individuals. In correlation, paired facts are studied. Thus, pupil’s marks
in one subject are compared with their marks in another subject. The degree or
amount of relationship is expressed by the coefficient of correlation, an index of
relationship.

The coefficient of correlation ranges in value from +1.00 (perfect positive


correlation), through zero (no correlation) to -1.00 (perfect negative correlation).
When a pupil gets high in one subject and low in another subject, it is an
example of positive correlation implying direct relationship.

Negative correlation implies inverse relationship. When a pupil gets low in


one subject and high in another subject is a cause of inverse relationship

The zero coefficient of correlation denotes no correlation. There is no


definite direction of increase or decrease in one subject over the other.

Case 1 Case 2 Case 3


Pupil A B A B A B
A 15 53 15 68 15 104
B 14 52 14 67 14 103
C 13 51 13 66 13 102
D 12 50 12 65 12 101
E 11 49 11 64 11 100

Positive Correlation Negative CorrelationZero Correlation


R = 1.00 R = -1.00 R = 0.00

The coefficient of correlation is issued in determining the validity,


reliability, and objectivity of a test prepared. By correlating the results of the test
prepared with the results of a valid criterion in the same subject, the validity of
the test prepared is determined. If the coefficient of correlation between them is
not less than 0.85, the test is more or less valid. The reliability of a test is also
found by comparing the results of the test prepared with those of a reliable test in
the same subject. If the scores assigned by two or more corrections is high, the
test prepared is objective; if the agreement is low the test is subjective.

126
The coefficient of correlation is also used for prediction. A pupil takes the
test in English literature but is absent in the test in American literature. With the
reliability of each of these test and the relationships between them known, it is
possible to predict the probable score the pupil the test in American literature by
substituting the values in the regression equation which are not within the scope
of this book.

Coefficient of Correlation by the Rank Difference Method or RHO

Procedure:
1. Rank the scores in each test and enter the corresponding ranks in
columns “Rank X” and “Rank Y”.
2. Find the difference in ranks by subtracting algebraically the ranks in one
test from the ranks in the other set and write difference in column D.
3. Square each of the difference in rank and enter it in column D 2.

4. Take the sum of the squares of the difference in rank ( ∑ ❑ D ) and the
2


numbers of pairs (N); substitute their values in the formula.

6∑ ❑ D
2

P ( RHO )=1− ❑

N ( N 2−1)

Table of Interpretation
Any coefficient of correlation that is not zero and that is also statistically
significant denotes some degree of relationship between two variables.

Less than 0.20 Slight correlation, negligible relationship


0.20 – 0.39 Low correlation, definite but small
relationship
0.40 – 0.69 Moderate correlation, substantial
relationship
0.70 – 0.80 High correlation, marked relationship

0.90 – 1.00 Very high correlation, very dependable


relationship

127
Name Course Date

EDUCATION 602- Statistics

Coefficient of Correlation by the Rank Difference Method or Rho


Exercise 21a

Pupils Test X Test Y Rank X Rank Y Difference D2


A 15 12 1 3 -2 4
B 14 14 2 1 1 1
C 13 10 3 5.5 -2.5 6.25
D 12 8 4 10.5 -6.5 42.25
E 11 12 6 3 3 9
F 11 9 6 7.5 -1.5 2.25
G 11 12 6 3 3 9
H 10 8 9 10.5 -1.5 2.25
I 10 10 9 5.5 3.5 12.25
J 10 9 9 7.5 1.5 2.25
K 9 8 11.5 10.5 1 1
L 9 7 11.5 13.5 -2 4
M 8 7 13 13.5 -0.5 0.25
N 7 8 14.5 10.5 4 16
O 7 6 14.5 15 -0.5 0.25

N = 15 ∑

2
❑ D =¿ 112

6∑ ❑D
2
P(RHO) = ❑
1−
N ¿¿

128
6 (112)
= 1−
15(152 −1)
672
= 1−
3360
= 1−0.20
= 0.80 High Positive Correlation Indicating Marked Relationship.

COEFFICIENT OF CORRELATION BY THE SPEARMAN “FOOTRULE”


FORMULA

Procedure:

1. Rank the scores in each test and enter the corresponding ranks in
columns “Rank X” and “Rank Y”.

2. Subtract the ranks in one test algebraically from the ranks in the
other test, but enter only the positive difference as are the gains in
ranks of one test over the other (Col. G).


3. Find the sum of the gains in ranks ( ∑ ❑G ) and the number of pairs

(N) and substitute their values in the formula. R=1−




❑G
2
N −1

4. The result is the Spearman’s coefficient of correlation.

129
130
Name Course Date

EDUCATION 602- STATISTICS

Coefficient of Correlation by the Spearman “Footrule” Formula


Exercise 21b

Pupils Test X Test Y Rank X Rank Y Difference Gain


A 15 12 1 3 -2 0
B 14 14 2 1 1 1
C 13 10 3 5.5 -2.5 0
D 12 8 4 10.5 -6.5 0
E 11 12 6 3 3 3
F 11 9 6 7.5 -1.5 0
G 11 12 6 3 3 3
H 10 8 9 10.5 -1.5 0
I 10 10 9 5.5 3.5 3.5
J 10 9 9 7.5 1.5 1.5
K 9 8 11.5 10.5 1 1
L 9 7 11.5 13.5 -2 0
M 8 7 13 13.5 -0.5 0
N 7 8 14.5 10.5 4 4
O 7 6 14.5 15 -0.5 0

N = 15 ∑

❑G=17

6 ∑ ❑G
R = 1− ❑
2
N −1
6(17)
= 1− 2
15 −1

131
102
= 1−
225−1
102
= 1−
224
= 1 – 0.46
R = 0.54 Moderate Correlation Indicating Substantial Relationship
THE PRODUCT MOMENT COEFFICIENT OF CORRELATION

The product – moment coefficient of correlation may be thought of


essentially as that ratio which expresses the extent to which changes in one
variable are accompanied by or are dependent upon, changes in a second
variable.
The sum of deviations from the mean (raised to some power) and divided
by N is called a moment. When corresponding deviations in x and y are

[ ]

multiplied together, summed and divided by N to give




❑ xy
the term product
N
moment is used.
The coefficient of correlation, r is often called the “Pearson” after
Professor Karl Pearson who developed the product moment method, following
the earlier work of Galton and Bravais.

Procedure:
1. Compute the arithmetic mean of each test by the long method.

2. Subtract algebraically the mean of one test from every score in that test to
get the corresponding deviations x and y.

3. Square the deviation of each test and sum them up. (Cols. X 2 and y2)

4. Multiply the deviation in one test by the corresponding deviations to the


other test and add the product algebraically (XY).

5. Divide the sum of the deviation squared (x 2 and y2) of each test by N.
Extract the square roots of the quotient to get the sign.

132
6. Divide the sum of the product by the products of the sigmas of the two
tests, or divide the algebraic sum of the products of the deviations by the
square root of the products of the sums of the squared deviations in the
two tests. The result is Pearson coefficient of correlations.



❑ xy
N
R=


❑ ❑
∑ ❑ x2



❑ y2
x
N N

Name Course Date

EDUCATION 602

The Product – Moment Coefficient of Correlation


Exercise 21c

Pupils Test X Test Y X Y XY X2 Y2


A 15 12 4.5 2.7 12.15 20.25 7.29
B 14 14 3.5 4.7 16.45 12.25 22.09
C 13 10 2.5 0.7 1.75 6.25 0.49
D 12 8 1.5 -1.3 -1.95 2.25 1.69
E 11 12 0.5 2.7 1.35 0.25 7.29
F 11 9 0.5 -0.3 -0.15 0.25 0.09
G 11 12 0.5 2.7 1.35 0.25 7.29
H 10 8 -0.5 -1.3 0.65 0.25 1.69
I 10 10 -0.5 0.7 -0.35 0.25 0.49
J 10 9 -0.5 -0.3 0.15 0.25 0.09
K 9 8 -1.5 -1.3 1.95 2.25 1.69
L 9 7 -1.5 -2.3 3.45 2.25 5.29
M 8 7 -2.5 -2.3 5.75 6.25 5.29
N 7 8 -3.5 -1.3 4.55 12.25 1.69
O 7 6 -3.5 -3.3 11.55 12.25 10.89

133
❑ ❑ ❑ ❑
N = 15 ∑

❑ X =157 ∑ ❑Y =140



❑ XY =58.65 ∑

2
❑ X =77.75



❑Y 2=73.35

❑ ❑

MX
∑❑X
= ❑ MY
∑ ❑Y
= ❑
N N
157 140
= =
15 15
MX = 10.5 MY = 9.3



❑ XY
N


R = ❑ ❑
∑❑ x

2


❑Y
2

x
N N
58.65
15
=

√ 77.75 73.35
15
3.91
x
15
=
√5.18 x 4.89
3.91
=
√25.3302
3.91
=
5.032
= 0.777
R = 0.78
R = 0.8 High Positive Correlation Indicating Direct Marked Relationship.

TRANSMUTATION OF RAW SCORES INTO RATING BY


THE LONG OR SPREAD METHOD

Procedure:
1. Determine the highest score and the lowest score.

134
2. Beginning from the highest score, write the number
consecutively down to the lowest score (top to bottom).

3. Tally the raw score into this consecutive number


distribution. Then summarize the tallies under column f.

4. Determine the rating scale for the period. (This is such


usually supplied by the principal or group chairman) such
as the 7 point scale or 5 point scale. Compute the required
percentages and indicate the number of scores necessary
in every percentage with brackets in the distribution of
scores.

5. Assign the upper and lower rating limits of every step


opposite the upper and lower score limits respectively.

6. Compute the difference between the upper and lower rating


limits and the range between the upper and lower limits of
every steps. Make the difference the numerator and the
range the denominator of the fractional part which is to be
consecutively added to the lower rating limit of the step
until the upper rating limit is reached.

7. Round the rating opposite every tallied number in the


distribution, consider a fraction if the numerator is at least
one – half of the denominator.

Name Course Date

EDUCATION 602

135
Transmutation of Raw Scores into Ratings by Long or Spread Method
Exercise 22a

HS= 59
LS= 12

Group A 5% =6 91-95
Group B 25% = 26 86-90
Group C 40% = 40 80-85
Group D 20% = 19 75-79
Group E 10% =9 70-74

Group A Equivalent Group C Equivalent


59 1 95 95 39 2 85 85
58 0 94 5/9 95 95 38 6 84 4/9 84 85
57 1 94 1/9 94 - 91 37 5 83 8/9 84 - 80
56 0 93 2/9 93 04 36 3 83 3/9 83 05
55 0 93 2/9 93 35 3 82 7/9 83
54 0 92 7/9 93 59 34 8 82 2/9 82 39
53 0 92 3/9 92 - 50 33 3 81 6/9 82 - 30
52 1 91 8/9 92 09 32 4 81 1/9 81 09
51 2 91 4/9 91 31 4 80 5/9 81
50 1 91 91 4/9 30 2 80 80 5/9
Group B Equivalent Group D Equivalent
49 1 90 90 29 4 79 79
48 1 89 5/9 90 90 28 0 78 5/9 79 79
47 0 89 1/9 89 - 86 27 3 78 1/9 78 - 75
46 3 88 3/9 88 04 26 3 77 6/9 78 04
45 3 87 3/9 87 25 2 77 2/9 77
44 1 86 8/9 87 49 24 2 76 7/9 77 29
43 3 87 3/9 87 - 40 23 1 76 3/9 76 - 20
42 2 86 8/9 87 09 22 1 75 8/9 76 09
41 5 86 4/9 86 21 1 75 4/9 75
40 7 86 86 4/9 20 2 75 75 4/9
Group E Equivalent
19 1 74 74 15 1 72 5/7 73 19
18 2 74 3/7 74 74 14 2 72 1/7 72 - 12
17 0 73 6/7 74 - 70 13 0 71 4/7 71 07
16 2 73 2/7 73 04 12 1 70 70
4/7

136
Name Course Date

EDUCATION 602

Transmutation by the Mean and SD of the Distribution


Exercise 22b

100 Raw Scores M = 34.36


HS = 59 SD = 9.76
LS = 12
Limits: Groups Grades
M+1.5SD to HS A 88-90
M+.5SD to M+1.5SD B 85-87
M-.5SD to M+.5SD C 80-84
M-1.5SD to M-.5SD D 77-79
LS to M-1.5SD E 75-76

Limits: Scores Ratings Scores


Ratings
49-59 59 90 90 33 81 7/9 82
39-48 58 89 4/5 90 32 81 3/9 81
29-38 57 89 3/5 90 31 80 8/9 81
20-28 56 89 2/5 89 30 80 4/9 80
12-19 55 89 1/5 89 29 80 80
Computations 54 89 89 28 79 79
9.76 x .5 = 4.880 53 88 4/5 89 27 78 ¾ 79
9.76x 1.5 = 14.640 52 88 3/5 89 26 78 2/4 78
34.36+14.64 = 49.00 51 88 2/5 89 25 78 1/4
34.36+4.88 = 39.48 50 88 1/5 88 24 78 78
34.36-4.88 = 29.00 49 88 88 23 77 ¾ 78
34.36-14.64 = 19.7248 87 87 22 77 2/4 78
47 86 7/9 87 21 77 ¼ 77
46 86 5/9 87 20 77 77
45 86 3/9 86 19 76 76
44 85 1/9 86 18 75 6/7 76
43 85 8/9 86 17 75 5/7 76
42 85 6/9 85 16 75 4/7 76
41 85 4/9 85 15 75 3/7 75
40 85 2/9 85 14 75 2/7 75
39 85 85 13 75 1/7 75
38 84 84 12 75 75
37 83 5/9 84
36 83 1/9 83
35 82 6/9 83
34 82 2/9 82

137
TRANSMUTATION OF RAW SCORES INTO RATINGS BY SHORT CUT
METHOD

Procedure:

1. Determine the highest and lowest scores. Find the range

2. Determine the highest and lowest ratings for the period. Find their
difference.

3. Divide the range by the difference to determine the size of class


interval (i) for a number of steps equal to the difference when the
remainder of the division operation is subtracted from the divisor.

4. If the lowest score is zero or extremely low in comparison to the


next consecutive number, distributed as many steps as is required
with the I computed in step 3. If the lowest score is not extremely low
begin the distribution with it.

5. Add 1 (constant) to the I in step 3 to determine the new I of the


remaining steps in the distribution. The number of steps using the
new I is equal to the remainder of the division operation. Continue
the distribution with this new I and this number of step. If the lowest
score has been written alone, the upper limit of the last or highest
step in the distribution is equal to the highest score. But if the lowest
score has been included in the first step of the distribution, the
highest score is written alone after the last step.

6. Assign the rating. The lowest rating is assigned to the lowest score
or lowest step, as the case may be, the next consecutive rating to
the step next to the lowest score or step and so on or score as the
case may be.

138
Note:

If the quotient when the range step 1 is divided by the difference


step 2 is exact, that is there is no remainder, if the quotient is the I of the
whole distribution and the number of steps is equal to the divisor. It is not
necessary to follow steps 5.

139
Name Course Date

EDUCATION 602

Transmutation by Short Method


Exercise 22c

Given:
N = 100 Scores Equivalent Rating
HS = 59 - 60 95
LS = 12 57 – 59 94
HR = 95 54 – 56 93 i = 3 (2 steps)
LR = 70 52 – 53 92 i = 2 (23 steps)
50 – 51 91
1. Determine the Range 48 – 49 90
HS= 59 46 – 47 89
LS= - 12 44 – 45 88
47 42 – 43 87
2. Determine the range of 40 – 41 86
ratings for the period. 38 – 39 85
HR= 95 36 – 37 84
LR= - 70 34 – 35 83
25 32 – 33 82
30 – 31 81
25/47 = 1.8 or 2 = 1st i 28 – 29 80
25-2 = 23 26 – 27 79
24 – 25 78
23 = No. of steps with an I of 2 22 – 23 77
2 = 1st i + 1 = 3 20 – 21 76
3 = 2nd i 18 – 19 75
2 = No. of steps with an I of 3 16 – 17 74
14 – 15 73
12 – 13 72
10 – 11 71
08 – 09 70

140
PART II

MODULES FOR
STATISTICAL
METHOD

141
Modules for Statistical Methods

STEPS IN HYPOTHESIS TESTING:

1. Construct the following hypothesis:


a. Research hypothesis (RH)
b. Null Hypothesis (Ho)
c. Alternate Hypothesis (Ha)
2. Set level of Significance:
(Set level of .05 for behavioral science researches: .01 in experimental
studies)
3. Determine the test statistics to e used.
4. Determine the critical value based on the degree of freedom (di)
5. Compute the values needed for testing.
6. Test for significance and give findings
a. Compute value > tabled value (significant)
b. Compute value < tabled value (rot significant)
7. Give decisions:
A. Reject Ho if:
Computed value > t.v. (.05 or .01)
B. Accept Ho if:
Computed value < t.v. (.05 or .01)
8. Interpret findings.
9. Give implications of findings.

EXERCISES
1. Recall two research studies you have read recently.
a. What would be the hypothesis you can generate from such studies:
1. Research hypothesis
2. Null hypothesis
3. Alternate hypothesis
b. Submit one good research study you would like to go into, what
would be your:
1. Research hypothesis

142
2. Null hypothesis
3. Alternate hypothesis

MODULE I
I. How to compute Chi-square (one sample) Testing the Significant
difference between responds within a group.
How to compute Chi-Square (one sample):
2
2 ( 0−Ʃ)
Formula: x =Ʃ
e
2
x = Chi- Square
Ʃ = sum
0 = Observed Frequencies
E = Expected Frequencies
II. Problems and Hypothesis (Null)
A. Null hypothesis (Ho): There will be no significant difference for each
of three kinds of responses.
B. Problems:
Thirty perspective teachers were asked their opinion about
the desirability of introducing technological innovations rate the
classroom.
1. To what extent is the teachers opinion on technological
innovations in the classroom?
2. Test at .05 level if a significant difference exists among the
teachers.
III. Statistical Procedures:

Step 1: Record expected and observed frequencies as follows:


f Agree Undecided Disagree Total
0 20 5 5 30
E 10 10 10 30
Step 2: (0−E)2 10 -5 -5
Step 3: (0- E2 ) 100 25 25
(0−E)
Step 4: 10 2.5 2.5
E
Step 5: Substitute the numbers in the chi-square formula and perform the
indicated operetions.
2
x = 10 + 2.5 + 2.5 = 15
x 2= 15

143
Step 6: Consult Chi-Square table with equal degree of freedom.
df= (R – 1) (C – 1) Tabled value (Critical Value) at
df= (2 – 1) (3 – 1) 2df = 5.99
df= 2

Step 7: Match computed value with critical value at .05:


Computed value Critical Value at .05
2
x = 15 df= 2
2
IV. Findings: x (15) > α .05 (5.99) α .05= 5.99
V. Decisions: Reject Ho ,,
There is a significant difference in opinions among the
teachers.
VI. Interpretations/Implication: Teachers differ significantly in their opinion
in the use of technological innovations in the classroom.
VII. Computing the weighted mean of the response to determine the extent
of opinion of teachers towards the use of technological innovations in
the classrooms.
Ʃ fw :
Formula: weighted x=
N
Ʃ= sum
f= frequency
w= weight
N= number of respondents
Give higher values in points to positive responses, lower values to
negative responses, such as:
agree = 3 undecided = 2 disagree = 1

Table of Computation of weighted X


Responses f w fw weighted X
Agree 20 3 60
Undecided 5 2 10 75/30 = 2.5 (Agree)
Disagree 5 1 5
Total 30

Numerical range of cut-off scores for 3 levels of descriptive rating


2.34 – 3.00 Agree
1. 67 – 2.33 Undecided
1.00 – 1.66 Disagree
Therefore teachers significantly agree on the use of technological
innovations in the classroom.

144
For five (5) levels of descriptive ratings, the following may be use
4.21 – 5.00 strongly agree
3.41 – 4.20 agree
2.61 – 3.40 undecided
1.81 – 2.60 disagree
1.00 – 1.80 strongly disagree

EXERCISE:
A. Problem: A survey questionnaire given to college teachers on the
issue. “It is best for students to be told what subjects to take rather
than have them choose for themselves.” Test at what .05 the
significant responses of the respondents.
B. Data:
Strongly Agree – 325 x 5 = 1625
Agree – 584 x 4 = 2336
Uncertain – 189 x 3 = 567
Strongly Disagree – 82 x 1 = 82
1466 5182
Questions:
1. Is there a significant difference in responses among the college teachers?
Support your answer.
2. What is the level of attitude of respondents?
3. If you were the college president, would you allow affixed curriculum for
students to follow strictly based on findings?

145
Module 2
How to compute the chi-square (two or more samples) testing the significant
between two or more groups.

I. Formula: X 2 = Ʃ (D – E)2 Where: x2= chi-square D=


observed freq. E Ʃ = sum of
E = expected freq.

II. Formula and Hypothesis (Null)


A. Null Hypothesis Ho: There is no significant difference in opinions
among the three groups of respondents towards the teaching of sec
education in school.
B. Problem: Teachers were asked if they favor the teaching of sex
education in school Test at .05 level that significant difference in
attitude exist among teachers.
C. Data: Scales SA A U D SD Total
Teachers 15 6 25 18 11 75
College 32 12 28 13 9 94
High School 26 18 29 15 10 98
Elementary 73 36 82 46 30 236
D. Questions:
1. Are the attitudes of the 3 levels of teachers significantly different?
E.
2. Is there a significant difference in attitude in each following
group:
a. Elementary Teachers
b. High School Teachers
c. College Teachers

3. What is the extent of attitude of each group of teachers?


a. Elementary b. High School c, College

4. Based on findings are you in favor of teacher sex education in our


schools?

146
III. Statistical Procedure:
Step 1. Record observed frequencies as follows:

Teachers/ SA A E D SD TOTAL
Scales

College 15(a) 6(b) 25(c) 18(d) 11(e) 75

High School 32(f) 12(g) 28(h) 13(i) 9(i) 94

Elementary 26(k) 18(l) 29(m) 15(n) 10(o) 98

TOTAL 73 36 82 46 30 267

Step 2. Compute the “e” of each cell. The “e” of each cell is
obtained by multiplying the marginal sums and divide this by the total number of
respondent (N).

cell (a) = 73x76 = 20.51 cell (f) = 73 x 92 = 25.70 cell (k) = 73 x 98 = 26.79
267 267 267

cell (b) = 36 x 75 = 10.11 cell (g) = 36 x 94 = 12.67 cell (l) = 36 x 98 = 13.21


267 267 267

cell (c) = 82 x 75 = 23.03 cell (h) = 82 x 94 = 28.87 cell (m) = 82 x 98 =


30.10
267 267 267

cell (d) = 46 x 75 = 12.92 cell (i) = 46 x 94 = 16.19 cell (n) = 46 x 98 =


16.88
267 267 267

cell (e) = 30 x 75 = 8.43 cell (j) = 30 x 94 = 10.56 cell (o) = 30 x 98 =


11.01
267 267 267

147
Step 3. Substitute the “e” in the chi – square formula and perform
the indicated operations.

cell (a) = (15-20.51)2 = 1.48 cell (b) = (6-10.11) 2 = 1.67 cell (c) = (25-23.03) 2 = 0.17
20.51 10.11 23.03

cell (d) = (18-12.92)2 = 2.00 cell (e) = (11-8.43)2 = 0.78 cell (f) = (32-25.70)2 = 1.54
12.92 8.43 25.70

cell (g) = (12-12.67)2 = 0.04 cell (h) = (28-28.87)2 = 0.03 cell (i) = (13-16.19)2 = 0.63
12.67 28.87 16.19

cell (k) = (26-26.79)2 = 0.02 cell (l) = (18-13.21)2 = 1.74 cell (m) = (29-30.10)2 = 0.04
26.79 13.21 30.10

cell (n) = (15-16.88)2 = 0.11 cell (o) = (10-11.01)2 = 0.09


16.82 11.01

X2 = 1.48 + 1.67 + 0.17 + 2.00 + 0.78 +1.54 + 0.04 + 0.03 + 0.63 + 0.23 + 0.02 + 1.74 + 0.04 + 0.21
+ 0.09
X2 = 10.67

Step 4. Match the computed value with the critical value.

Computed value critical value at .05 (15.57)


X2 = 10.67 d = (R – 1) (c-1)
= (3-1) (5-1)
= (2) (4)
df = 8 at 0.05 ∫ = (15.51)

Findings:
X2 (10.67) is lesser than ∫ .05 (15.51)

Decision: Accept 110


There is no significant difference in opinion among
the three groups.

Interpretations/Implementations:
Opinion among groups of teachers in comparable to
their opinion on the teaching of sex education In school is
agreeable to teachers.

148
Module 3
How to Compute Coefficient of Contingency

Through the Chi-Square

Problem I. Is there a significant correlation between marriage adjustment and


education

of husbands.

2.) Test at .05 level of significance.

Data: Education of Husbands: Marriage Adjustment Levels

Very Low Low High Very High

Grad Work 4 9 38 54

College 20 31 55 99

High School 23 37 41 51

Elementary 11 10 11 19

149
Questions:

1. Construct the Following


1.1. Research Hypothesis
1.2. Null Hypothesis
1.3. Alternate Hypothesis
2. Compute the Following
2.1. X2 2.3 “c” 2.5 df (x2) 2.7 CU (x2)
2.2. e 2.4 t 2.6 df (t) 2.8 CU (t)

Formula: Where:

√ x2
N +x 2
C = coefficient of contingency

x2 = Chi - Square
N = Number of cases
II. Problem and Hypothesis: (NULL)
A. Null – Hypothesis (HO): There is no significant correlation between
marriage adjustment level and education of husbands
B. Problem: Test at .05 level of significance that There is no significant
correlation between marriage adjustment level and education of
husbands
III. Statistical Procedure
Step 1.Record observed frequencies as follows:
Education of Husband: Marriage Adjustment Levels

Very Low Low High Very High TOTAL

Grad Work 4 (a) 9 (b) 38 (c) 54 (d) 105

College 20 (e) 31(f) 55 (g) 99 (h) 205

High School 23 (i) 37 (j) 41 (k) 51 (l) 152

Elementary 11 (m) 10 (n) 11 (o) 19 (o) 51

513

Step 2.Compute the “c” of each cell by multiplying the marginal sums and
dividing this by the total number (N).

150
58 x 105 58 x 205
Cell (a) = =11.87 Cell (e) = =23.18 Cell (i) =
513 513
58 x 152
=17.19
513
87 x 105 87 x 205
Cell (b) = =17.81 Cell (f) = =34.77 Cell (j) =
513 513
87 x 152
=25.78
513
145 x 105 145 x 205
Cell (c) = =29.68 Cell (g) = =57.94 Cell (k) =
513 513
145 x 152
=242.96
513
283 x 105 223 x 205
Cell (d) = =45.64 Cell (h) = =89.11 Cell (l) =
513 513
223 x 152
=66.07
513

Step 3.Compute

X 2=


❑ ( o−e ) x
2

2 2
( 4−11.87) (23−17.19)
Cell (a) = =5.22 Cell (i) = =1.96
11.87 17.19

(9−17.81)2 (37−25.78)2
Cell (b) = =4.36 Cell (j) = =4.88
17.81 25.78
2 2
(38−29.68) ( 41−42.96)
Cell (c) = =2.33 Cell (k) = =0.09
29.68 42.96
2 2
(54−45.64) (51−66.07)
Cell (d) = =1.53 Cell (l) = =3.44
45.64 66.07
2 2
(20−23.18) (11−5.77)
Cell (e) = =0.44 Cell (m) = =4.74
23.18 5.77

(31−34.77)2 (10−8.65)2
Cell (f) = =0.41 Cell (n) = =0.21
34.77 8.65

151
2 2
(85−57.94) (11−14.42)
Cell (g) = =0.15 Cell (o) = =0.81
57.94 14.42
2 2
(99−89.11) (19−22.17)
Cell (h) = =1.10 Cell (p) = =0.45
89.11 22.17

IV. Matching computed Value with Table Value


Computed Value x2 = 32.12 Critical Value at 0.05
df = (2-1) x (c-1)
= (4-1) x (4-1)
= 3x3
df = 9∝ at 0.05 (df=9) = 16.92

V. Findings: x2 (32.12) >δ 0.05 (16.92)


The x2 value of 32.12 is greater than the critical value of δ 0.05 of 16.92.
Relationship therefore is established between marriage adjustment level and
education of husbands.

Step 4: C=
√ x2
N +x 2
=
√ 32.12
513+32.12
=0.24

C 0.24
Step 5: = =0.29
84 0.84

Step 6: t = C
√ N−2
1.00−C
=0.29
√ 513−2
1.00−(0.29)
=7.78

Match computed Value with Table Value


Computed Value Critical Value at 0.05
x2 = 1.99 df = (R-1) x (c-1)
= (2-1) x (5-1)
= 1 (4)

152
df = 4
∝ .05 (4df) = 9.488 tabled value
X2 = 1.99 <∝ .05 (9.488)

Findings x2 (1.99) is lesser than the critical value at 0.05 level of significance
(9.488)
Decision: Accept the hypothesis of no divergence
It is a normal distribution

Interpretation/Implication:
Analysis of the data has shown no significant divergence in 5 levels of
performance in N1 level the same number of expected (0.42) pupils compared to
the observed no (1) same is true with the ms vs and o levels, however we are
only four pupils who are satisfactory (S) out of the expected 5 pupils the
distribution is normal.

Answer to Questions:
1. The graph is not normal at 3 levels of performance
2. The graph is normal at 5 levels of performance

Module 4

153
How to determine the profile of the Academic Performance of a group (Testing
the significant Divergence from the normal curve of distribution)

FORMULA :

(O−E ) ❑2
x 2=∑ ❑
❑ E
2
x =Chi−square
= sum
O = observed frequency
E = expected frequency

The expected frequencies in the test for normality are not of the
distribution of the normal curve.
II. PROBLEM AND HYPOTHESIS (null)
The distribution of levels of performance in a mathematics test is not
divergent from the normal curve of distribution.
(Distribution is normal)
A. Problem: Grade IV pupils were given an achievement test in
Mathematics. Test at .05 level that distribution of performance is
not divergent from the normal curve of distribution.

B. Data: Level of performance Frequencies


Above Average 18
Average 26
Below Average 4

Total
48

154
III. STATISTICS L PROCEDURE
Step 1. Record observed frequencies.
Perform indicated operations.
Frequencies BA A AA TOTAL
Observed (o) 4 26 18 48
Expected (e) 16% or 48 68% of 48 16% of 48
Step 2. O-E 3.68 6.64 7.68
STEP 3. (O – E)❑2 13.54 44.09 106.50
2
(O−E)❑
STEP 4. 1.76 1.35 13.87
E
Expected “E” in 5 levels of academic performance uses the following proportion.
Outstanding – 3.5% satisfactory – 45%
Very satisfactory—24% moderately satisfactory—24%
Needs improvement—3.5%
Step 5. Substitute the numbers in the chi-square formula and perform the
indicated operation.
2
x =1.76 + 1.35 + 13.87 = 16.98

x 2=16.98
Step 6. Match computed value with critical value.
Computed value critical value at .05
2
x =16.98 df = (3-1) (2-1)

=2
0.5 (df2) = 5.99
2
IV. FINDINGS: x ( 16.98 ) is greater than the critical value at .05 level of

significance (5.99)
V. DECISION: Reject the hypothesis of no divergence
VI. INTERPRETATIONS/IMPLICATIONS:

155
Analysis of data has shown a significant divergence in 3 level of
performance. In the BA (below average) level there are only 4 out of the
expected 7 pupils (7.68). In the A (average) level, there are only 26 out of the
expected 32 pupils (32.64). However, there are more pupils (18) than expected
(7.68) in the AA (Above Average) level. The distribution presents a skewness to
the left the normal curve of distribution.
Therefore the academic performance of the Grade IV pupils is that we
have brighter pupils than poor ones.

EXERCISE :
I. PROBLEM : Give the profile of the academic performance of a group
of pupils in a Reading test getting the following scores.

II. Data : Scores in a Reading Test


30, 29, 28, 25, 23, 21, 20, 19, 18, 17, 16, 12
III. Questions:
Is the group normal or not in a
a. 3 levels of performance
b. 5 levels of performance

156
MODULE 4a. How to convert scores into Grades: (Levels of Academic
Performance) A, B, C,
D, F
I. Formula : 1.8 SD ± x (A and F)
.6 SD ± x (B, C, D)

II. Scores: 56,53, 48, 38, 37, 23, 18, 15, 8

III. Procedures:

Step 1. Compute the mean score : ( x ¿=


∑ of X =33.60
N

Step 2. Compute the SD (.5) = sum of √ ( x−x ) ❑2


N
STEP 3. Compute the cut-off scores for A?B?C?D?F?
For A and F
15.93 x 1.8 = 28.68 28.68 + 33.60 = 62.27 (A)

For B, C, D
15.93 X .6 = 9.56 33.60 + 9. 56 = 43.16 (B)

STEP A. Setting the levels of Performance with Cut-off score.


62.27____________ A Levels with Frequency
62.26 56
53 A --- 0
43.16____48______B B --- 3
43.15 40 C --- 3
38 D --- 4
24.15____37______ C F --- 0
24.04 23
18
15
4.94 8_______ D
4. 93 F

Of cut-off scores for A?B?C?D?F? Levels of Performance.


62. 27 up - A 62.15 =A
43.16 – 62.26 - B 43.12 – 62.14 = B
24.05 – 43.15 - C 24.08 -- 43.11 = C
4.94—24.05 – D 5.06 – 24.07 =D
4.93- below - F

157
TABLE A.

VALUES of t (use critical ratio) at the .05 and .01 levels of


Significance.

Example : When the df are 20 an the t is 2.09, the .05 level means that 5
times in 100 trials in a divergence as large or as larger than that obtained (plus or
minus) may be expected under the null hypothesis.

Degrees of Freedom .05 .01

Df
1 12.71 63.66
2 4.30 9.92
3 3.18 5.84
4 2.78 4.60
5 2.57 4.03
6 2.45 3.71
7 2.36 3.50
8 2.31 3.36
9 2.26 3.25
10 2.23 3.17
11 2.20 3.11
12 2.18 3.06
13 2.16 3.01
14 2.14 2.98
15 2.13 2.95
16 2.12 2.92
17 2.11 2.90
18 2.10 2.88
19 2.09 2.86
20 2.09 2.84
21 2.08 2.83
22 2.07 2.82
23 2.07 2.81
24 2.06 2.80
25 2.06 2.79
26 2.06 2.78
27 2.05 2.77
28 2.05 2.76
29 2.04 2.76
30 2.04 2.75 .
50 2.01 2.68.
100 1.98 2.63

Over 100 1.96 2.58

158
TABLE B.

Values of Chi-square ( x 2 ¿ at the .05 and the .01 levels of


Significance.

Example : For 12 degrees of freedom, a computed x 2 must be at least at large


21.03 to be significance at the 5% level and as large as 26.22 to be significant at
the level.

Degrees of Freedom .05 .01

1 3.84 6.64
2 5.99 9.21
3 7.82 11.34
4 9.49 13.28
5 11.07 15.09
6 12.59 16.81
7 14.07 18.48
8 15.51 20.09
9 16.92 21.67
10 18.31 23.21
11 19.68 24.72
12 21.03 26.22
13 22.36 27.69
14 23.68 29.14
15 25.00 30.58
16 26.30 32.00
17 27.59 33.41
18 28.87 34.80

19 30.14 36.19
20 31.41 37.57
21 32.67 38.93

22 33.92 40.29
23 35.17 41.64
24 36.42 42.98
25 37.65 44.31
26 38.88 45.64

27 40.11 46.96
28 41.34 48.28
29 42.56 49.59
30 43.77 50.89

159
Module 5
How to compute the Product-Moment Correlation (r)
(Testing the significant correlation between two variables)

( x−X ) ( y −Y )
Formula: r =
( N )( SDx ) ( SDy )

r = this coefficient of correlation


Ʃ = sum of
x – X = differences between each scores on test x and the mean of the test
N = the number of pairs of the scores
SDx = the standard deviation of the test x
SDy = the standard deviation of the test y
Problem and Hypothesis:
Null Hypothesis
There is no significant correlation between Test X and Test Y Scores
Problem:
Test at 0.05 level of significance that there is no significant correlation between
test x and test y scores.

Students Test X Test Y

A 50 60

B 60 80

C 70 90

D 80 70

E 90 10

160
STATISTICAL PROCEDURE:

Step 1: Prepare 4 columns of figure for each test, as follows:

Student X X x-X ( x− X)2 y Y y-Y ¿¿ (x-X) (y-Y)


s

Total

A 50 70 -20 400 60 80 -20 400 (-20) (-20)

400

B 60 70 -10 100 80 80 0 0 -10 (0)

C 70 70 0 0 90 80 10 100 (0) (10)

D 80 70 10 100 70 80 -10 100 (10) (-10)

-100

E 90 70 20 400 10 80 20 400 (20) (20)


0
100

Total 350 1000 1000 700


=70
5

Step 2: To obtain the sum of (x-X)(y-Y) multiply columns x-X and y-Y as indicated
above, then add these products:

The sum of (x-X) (y-Y) = 700

Step 3: Determine the standard Deviation of each follows:

161
SDx = 14.14 SDy = 14.14

Step 4: Substitute the numbers in the formula as follows:

700
r=
(5)(14.14)(14.14)

= 0.70

Step 5: Test the significance of r with the t-test formula:

t = r √ n−21.00−¿ ¿ t = 0.70
√ 5−2 1.00−¿ ¿
t = 1.69

Step 6: Match computed t – value with the critical t – value.

Computed Value Critical Value

t = (1.69) df = N – 2

=5-2

=3

df = 0.05(df3)

= 3.18

IV. Findings:
t (1.69) is less than the significant correlation between test x and test y
scores.
V. Decision:
Accept Ho.

VII. nterpretation: there is no significant correlation between test x and y scores.

1. Problem: Test the hypothesis of no significant correlation between Grade


IV pupils, Process and attitude scores at 0.05 level of significance.
2. Data:

162
Student Test A Test B

(Process test) (Attitude test)

A 22 35

B 16 25

C 25 30

D 9 15

E 10 20

3. Question:

A. Describe the relationship between the Pupils performance in Process and


Attitude Test?

B. Is the correlation significant?

Answer:

STATISTICAL PROCEDURE:

Step 1: Prepare 4 columns of figure for each test, as follows:

Students x X x-X (x− X)2 Y Y y-Y ¿¿ (x-X) (y-Y)

Total

A 22 16.4 5.6 31.36 35 25 10 100 5.6 10

56.0

B 16 16.4 - 0.16 25 25 0 0 -0.4 0


0.4
0

C 25 16.4 8.6 73.96 30 25 5 25 8.6 5

43.0

163
D 9 16.4 - 54.76 15 25 -10 100 -7.4 -10
7.4
74.0

E 10 16.4 - 40.96 20 25 -5 25 -6.4 -5


6.4
32.0

Total 82 201.2 25 250


=16.4
5

Step 2: To obtain the sum of (x-X)(y-Y) multiply columns x-X and y-Y as
indicated above, then add these products:

The sum of (x-X) (y-Y) = 700

Step 3: Determine the standard deviation of each follows:

SDx = 6.34 SDy = 7.07

Step 4: Substitute the numbers in the formula as follows:

205
r =
(5)(6.34)(7.07)

= 0.91

Step 5: Test the significance of r with the t-test formula:

t = r √ n−21.00−¿ ¿ t = 0.91 √ 5−2 1.00−¿ ¿


t = 3.80

Step 6: Match computed t – value with the critical t – value.

Computed Value Critical


Value

t = (3.80) df = N – 2

=5-2

=3

164
df = 0.05(df3) =
3.18

V. Findings: t (3.80) is less than the significant correlation between test x and test
y scores.

VI. Decision: Reject Ho.

VII. Interpretation: there is no significant correlation between test x and y scores.

EXERCISE

Problem:

A unit test was given to a group of Grade VI pupils in Science. Here are the
scores. 28, 30, 33, 42, 17, 18, 33, 32

Questions:

Compute the following

Mean Score

Standard Deviation

Compute the cut-off scores for levels f academic performance.

How many of the Students are outstanding, very satisfactory, satisfactory,


moderately satisfactory, needs improvement; in 3 levels.

Answer:

Ʃx
Mean Score = = 29.89
N

Standard Deviation = √¿¿ = 7.59

Cut off Scores

For A and F

7.59 x 1.8 = 13.66

165
29.89 + 13.66 = 43.55

29.89 – 4.55 = 25.34

C are the score between B and D

Setting the levels of performance with Cut off Scores

43.55
A
43.54

42
36
34.44
B
34.43

33
33
32
30
25.34
C
25.33

28
18
17
16.23
D
16.22

Summary of Cut – Off Scores for A, B, C, D level of performance

43.44 – up = 0

34.4 – 43.54 = 2

25.34 – 34.43 = 3

16.23 – 25.33 = 2

16.22 – below = 0

N= 9

5 Level of Performance

166
f
Outstanding (O) 42, 36 2
Very Satisfactory (VS) 33, 33, 32 3
Satisfactory(S) 30, 28 2
Moderately Satisfactory (MS) 18, 17 2
Needs Improvement(NI) 0

N= 9

3 Level of Performance

f
Above Average 42,36 2
Average 33, 33, 32, 30,28 5
Below Average 18, 17 2

Module 6
(Biserial r)Testing Statistically the Significant Correlation between Variable A
(Continuous) and Variable B (Discontinuous)

I. Hypothesis Problem:
Is there relationship between music appreciation and training in music?
A. Variables:
Variable A (Continuous) – Music appreciation test score
Variable B (Discontinuous) –
1. Those with training in Music (Group 1)
2. Those without training in Music (Group 2)

167
B. Formula:

Biserial Coef. of Correlation or Biserial r

X1 (p) – x2 Pq
R (bis) - (q) x
SD U

II. Hypothetical Data :

Music Apprec. With training in Without training in


Total
Scores Music Music

85 – 89 5 6 11

80 – 84 2 16 18

75 – 79 6 19 25

70 – 74 6 27 33

65 – 69 1 19 20

60 – 64 0 21 21

55 – 59 1 16 17

N1 = 21 N2 = 124 N = 145
Total

168
III. Hypothesis:

There is no significant correlation between performance in music


appreciation scores and training in music.

IV. Statistical Procedures:

A. Step 1. Compute the following values:


1. X1 (p) = mean of group 1 (77.00)
2. X2 (q) = mean of group 2 (71.35)
3. SD = standard deviation of all scores (8.80)

B. Step 2. Compute p and q.


1. p = proportion of group 1 (0.145)
2. q = proportion of group 2 (0.855)
3. p and q Proportion in the Normal Curve

Untrained (q) 50% 35% trained (p)

p = 85.5%

14.5%

C. Step 3. Compute u.

U is the value of the height of the ordinate of the specific area in the
normal curve. (See illustration of u in the normal curve).

Step 3a – Solve are above the mean (AM).

q - 50%
AM =
100

169
85.55%-50% (assuming that q is the
=
100 larger group)

35.5%
=
100

= 0.355

Step 3b – See tabled value of 0.355 AM to get the value of U.


AM = 0.355

U = 0.288

D. Step 4 – Substitute computed values in formula for r (bis).

x1 (p) - xz (q) pq
r = X
SD u

77.00 – 71.35 (0.145) ( 0.855)


= X
SD 0.288

5.65 0.124
= X
8.80 0.288

= 0.642 x 0.431

170
r (bis) = 0.276 or 0.28

E. Step 5 – Test the significant of r with t = test

t = r N - 2

1–(r)2

t = 0.28 142 - 2

1 – (0.28) 2

t = 0.28 143

0.22

t = 0.28 650

t = 0.28 x 25.50

171
t = 7.14

V. Findings:
1. What is the value of r?

● r (bis) = 0.276 0r 0.28

How do you describe it, low, marked, strong, high?

● Low

2. Is the correlation significant?

● significant

Support this with your t-test findings.

● The t-test finding is 7.14 which is higher than the computed


values.

172
Module 7
Testing the SignificantChange in opinions/Attitudes of a Group After
Treatment

I. Hypothetical Problem: Fifteen Nursing students were given a personality


inventory test
beforethe start of a personality inventory test before the start of a
Personality Development
Program.Test at 0.5 that there is no significant change in attitude of the
subjects after the
development program.

II.Hypothetical Data:

AFTER

Negative Positive

Positive 13A 2B
_________
Negative 9C 6D

Before

III. Hypothesis :There is no significant change in attitude after the personality


development
Program

IV. Level of Significance – set A .05


V. Df = (R – 1)(C-1)
VI. Test of Significant Change
VII. Formula:

( ⃓ A−D ⃓ −⃓ ) ❑2
X2=
A+ D

173
VIII.Computation

2 ( ⃓ 13−⃓ G−⃓ ) ❑2
X = df =(−1)(C−1)
19

( 7−1 ) ❑2
= =( 2−1 ) ( 2−1 )
19

3G
= = (1 ) ( 1 )
19

= 1.89 =1

IX. Findings.
2
X = (1.89) < .05(3.84)

X. Decision

Accept Ho of No Significant change

XI. Interpretation

The attitude of the fifteen nursing students did not change after the
development program. Attitude is
the difficult to change. It takes a longer time to change his attitude one’s attitude.

174
Module 8
How to compute the t-ratio (Testing the Significant Difference between two
Mean Scores of Large N’s)

I. Formula
X1-x2

A= (SD1)2 ₊ (SD2)2
N1 N2

t = t-ratio also called t-task or critical ration

X 1 = mean of sample 1 (SD 1)2 = square of error of mean for


sample 1
N2

X 2 = mean of sample 2 (SD 2)2 = square of error of mean for


sample 2
N2
N1 = number of cases of sample 1

N2 = number of cases of sample 2

II. Problem and Null Hypothesis:

III.

A. Problem:
School A and School B were compared in the performance
achievement test in Mathematics of Grade VI pupils of the same
City. Test at .05 level that there is no significant different in mean
scores between the 2 scores.

175
B. Null Hypothesis (Ho): There is no significant difference in mean
scores between School A and School B in a Mathematic Test.
C. Data: School A School B
Difference
X 69 67
2
SD 10 12
N 100 144

IV. Statistic Procedure:

Step 1 – Determine the mean of sample 1 and 2

Step 2 = Compute the Standard error of the mean for sample a and b.

Standard error for the mean sample a.

SEX1 = square root of (10)2 = square root of 100 or 1


100 100

= square root of (12)2 = square root of 144 or 1


144 144

Step 3 – Substitute the number in the formula and perform the indicate operation.

A= 69-67 2
=
(10)2 + (12)2 100 + 144
100 100 100 144

2 2 2
A= = = = t = 1.42
1+1 2 1.41

Step 4 – Match computed value with critical value:

Computed Value Critical Value at .05


t = 1.42 df = (N 1+ (N2)2
df = (100-1)+ (144-1)

176
df = 99+143
df = 242
v .05= (df 242)= 1.96

V. Findings:

t = (1.42) is less than critical value (1.96)

VI. Decision Accept Ho.


There is no significant difference in the mean scores between School A
and School B in a Mathematics Achievement Test.

VII. Interpretation/ Implications:


The performance of pupils in School A and School B is Comparable.

EXERCISE:

I. Problem: Class A and Class B were compared in the performance of


an achievement test of first year high school student in biology. Test
at .05 level that there is no significant difference in mean score
between the 2 schools.

II. Data:

Class A Class B

X 74.55 62.42

SD 14.70 18.25

N 6.6 8.2

III. Questions:
1. Const. the following:

a. RH
b. Ho
c. Ha

177
2. Compute

a. T
b. Critical value at .05

3. Give the following:

a. Findings
b. Decision
c. Interpretation

Answers:

1.
a. RH

b. Ho – There is no significant difference in mean scores between the 2


schools.

c. Ha

2. Compute

a. t

A= 74.55- 62.42

(14.70)2 (18.25)2
6.6 + 8.2

= 12.13

216.09 333.06
6.6 + 8.2

12.13

32.74 + 40.62

= 12.13

178
73.36

= 12.13
8.57
A = 1.42

t = 1.42

b. critical value at .05

Computed Value Critical Value at .05


t = 1.42 df = (N 1+ (N2)2
df = (6.6-1)+ (8.2-1)
df = 5.6 + 7.2
df = 12.8
v .05 = (df 12.8)= 2.16
3.
a. Findings

- T = (1.42) is less than critical value (2.16)

b. Decision: Accept Ho.


There is no significant difference in mean scores between the 2
schools.

c. Interpretation/ Implication
The performance of the Achievement Test in Class A and Class B is
comparable.

179
Module 9
How to compute the Significant difference between two mean scores when N is
small (below 30)

A. Hypothetical Problem

An experiment is conducted to find out if the use of modular instruction is


effective in the teaching of Mathematics in Grade IV. Two classes, experimental
and control groups were organized. Test at .05 that there is no significant
difference in performance between the two groups. Is modular instruction
effective in teaching Grade IV Mathematics?

C. Null Hypothetical Data

Control Group Experimental Group

Scores Scores

18 15

17 18

16 19

12 18

19 19

15 20

180
19 19

12 N=8

N = 10

C. Formula

(Critical t = ratio for small N’s)

t = X1 – X2 where S2 = ( x - X)2

2
S1 + S
2
N1 N2

D. Statistical Procedures

Step 1 : Compute the mean score of each group :

a. Mean 1 = 16
b. Mean 2 = 18

Step 2 : Compute the variable

(S2) of each group

62 64
a. S2 = =
10−1 9

2
S = 7. 11
1

181
64 64
S2 = =
10−1 9

2
S = 7. 11
1

Module 10
How to compute the significant differences in two mean scores between
pre post test scores.

I. Hypothetical Problem
Test at .05 level if there is a significant mean gain between pre and
post test scores in an activity- centered science class.

II. Null Hypothesis


There is no significant differences in mean scores between pre and
post test data.

III. Formula
xd
A=
S2
√N
Where s 2=

( X −xd ) 2
N−1

IV. Data

Pre-Post Post-Test

Students Score Scores xd Xd x-Xd


(x-Xd)2

A 8 11 3 0 0
B 2 4 2 3 -1 1
C 6 10 4 +1 1

182
N=3 2

V. Statistical Procedure
Step 1- Compute the mean difference (Xd)

Step 2 – Compute the variables (S2)

S2 =
√ ( xd ixd ) 2
N −1
=
2
2√
S2 =
2
2
2√
S = 1 = (1)
=
S = (1)
2
√ 2
2

Step 3- Compute the square root of N


√3 = 1.73
Step 4 – Substitute value in formula

3 3
t= =
1 1

t= √ 3 = 1.73

3
t=
50

t= 5.17

Step 5- Method of computed t-value with tabled value.

Computed Value Tabled Value at .05

t= 5.17 df= N-1

= 3-1

=2

.05= 4.30 (two-tailed)

= 2.92 (one-tailed)

VI. Findings
t( 5.17) > .05 (2.92)
The computed t (5.17) is greater than the tabled value of 2.92.

183
VII. Decision
Rejected the Null Hypothesis

VIII. Interpretation:

There is a significant difference in two mean scores between pre-test


and post test, pre-out gains in the study. The pupils have improved in
their science skills after the activity.

184

You might also like