You are on page 1of 22

MMW 101

MATHEMATICS IN THE
MODERN WORLD

UNIT IV
Data Management
Statistics: Our Life Saver
and Influencer
Statistics: Our Life Saver and Influencer

Duration: 15 Hours

Overview of the Unit

Most of us often look at statistics as a mere subject in school that we have to


take as an academic requirement. However, the present world we live in is a world of
information by which correct data and statistical concepts are necessary to solve
complex problems. Thus, the management of facts and figures is the vital role of
statistics to make this world a better and safer place to live a more comfortable life.
Because we are used to the many things around that, we tend to overlook what
helps us to plan and achieve goals successfully, what to apply and take into
consideration when making the right decisions, and enjoy a secured life. Let me
remind you that our lives are saved and influenced daily by the contributions of
Statistics in the following:
1. predictions
2. quality testing
3. weather forecasting
4. emergency preparedness
5. political campaign
6. predicting diseases
7. insurance
8. consumer goods
9. financial market
10. sports

May these serve as an inspiration and motivation for you to learn and
understand the lessons that you have to go through in this unit as follows:

4.1 Collection of Data


4.2 Organizing Data
4.3 Measures of Central Tendency
4.4 Quantiles
4.5 Measures of Variability

Objectives of the Unit


At the end of the unit, you are expected to:
1. Use a variety of statistical tools to process and manage data
2. Advocate the use of statistical data in making important decisions
Let's Start!

Lesson 4.1 Collection of Data

Objectives of the Lesson


At the end of the lesson, you should be able to:
1. differentiate the types of data
2. tell the differences among the different methods of collecting data
3. determine the sample size
4. identify the appropriate sampling technique to be used in gathering data

Daily, we come across different kinds of information, data, facts, and figures
from various communication and information media. Some current examples are:
 surveys conducted by SWS on ratings of public offices and officials or
opinions of the public on issues using a sample of 1200 respondents
 the daily data on Philippine COVID-19 cases provided by the Department of
Health (such data include new cases, fully recovered, and deaths added to
the previous total cases) where the active cases are categorized into
asymptomatic, mild, severe, and critical with corresponding percent.
These data are also the basis for the prediction of UP experts to project the
total cases up to a certain period, which are seemingly accurate.
 weather conditions and forecasts
 the clinical trials for COVID-19 vaccines
 reports of DOTr on the number of commuters concerning the number of public
transportations to be allowed to operate in the quarantine periods
 reports on the stock market situation
 the estimated funds of Philhealth that were lost due to corruption

Lesson 4.1 will give you insights on how data such as mentioned above, are
collected.

Lesson 4.1.1 Data Gathering

4.1.1.1 Types of Data

There are two types of data, namely:


1. Primary data. These are data or information gathered by the researcher
from first-hand sources, like government offices, private organizations, business
establishments, or individuals with first-hand information about the needed data.

2. Secondary data. These are data or information obtained from published


or unpublished sources like newspapers, magazines, journals, books and theses,
and other republished materials
For example, the details in a vehicular accident gathered by a policeman who
interviewed the victims are primary data. The viewers who watched the news about
this incident are getting only secondary data.

4.1.1.2 Methods of Collecting Data


1. The Direct or Interview Method. This method involves the interviewer
(the person conducting the interview/the researcher), and the interviewee
(the person from whom data is being gathered). This method can provide
the researcher with the opportunity to ask more questions to the
interviewee to obtain all the information that he needs. He can also make
clarifications, if necessary. Questions can be repeated or rephrased for a
clearer understanding of the person being interviewed. The interviewer
has to see to it that he does not influence the responses of the interviewee
in any manner. However, this method is more time consuming and more
costly.

Two forms of Interviews

1. Structured. The questions are closed-ended and are already prepared


and asked in the same wordings.
2. Unstructured. In this form, there are no prepared questions at hand
before the interview.

2. The Indirect or Questionnaire Method. This method utilizes


questionnaires to be answered by the respondents. These questionnaires
must be carefully prepared so that the respondents can clearly understand
the directions and the questions to give honest responses. Many
researchers use this method since it is less time-consuming. It is less
expensive because the questionnaires can be reused. A limitation of this
method is that it can only be used for literate persons.

These are the two forms of questions.

1. A fixed-alternative question is a question where the possible responses


from which the respondent is to choose his answer are given.

Example: How often do you read the newspaper?


O once a day O when there is an assignment
O every day O every other day
O once a week O never at all

2. An open-ended question allows the respondent to express his answer


freely.
Examples: 1) How much longer do you think it would take for the pandemic to
end?
2) What are the common problems that students would encounter
in using online mode of learning, and what can you suggest
to remedy these?

Other methods of collecting data are as follows:


1. The observation method is used when gathering data about the behavior of
individuals in the study.

2. Experimentation Method. In this method, the researcher observes the effect


of a variable on other variables. The independent variable of the study is the
variable that is manipulated to see its effect/s on the dependent variable (the
variable that may change).

3. Registration Method refers to the continuous, permanent, compulsory


recording of the occurrence of significant events and certain identifying or
descriptive characteristics concerning them, as provided through the civil
code, laws, or regulations of each country.

The vital events may be live births, fetal deaths, deaths, marriages, divorces,
judicial separations, annulments of marriage, adoptions, recognitions
(acknowledgments of natural children), legitimating.

4. Texting Method. In this method, the researcher gathers data in the survey
being conducted through text messages.

4.1.1.3 Determining the Sample Size

In conducting a study, the researcher must consider the time element and the cost
involved to complete the study. This is why most researchers make use of a sample
(the representative of the population and possesses the characteristic of the
population) instead of the population (the entirety of objects, individuals, events, or
things). Slovin's formula is used to determine an appropriate sample size from the
population.

The Slovin's formula is n = Formula 1


N
2
1+ N e
where: n = sample size; N = population size ; e = margin of error

The margin of error shows how reliable the result of the survey is. A small
margin of error means that it is more likely that the results of the survey are true for
the population.

Example 1. A group of environmentalists is conducting a survey on the


opinions of people regarding the putting of dolomite sand in and around Manila Bay.
If the population of the study is 6000 residents in Metro Manila and the margin of
error to be used is 5%, what would be the sample size?

Solution:
Given: N = 6000 e = 5% = 0.05

N 6000
n= =
1+ N e 1+ (6000 ) ¿ ¿
2
Example 2. Using example 1, what will be the sample size if the margin of
error is 8%?

N 6000
n= =
1+ N e 1+ (6000 ) ¿ ¿
2

Did you notice that the bigger the margin of error, the smaller the
sample size becomes?

Example 3. Another researcher wants to conduct the same survey. However,


due to time constraints and limited budget, he will be using 100 respondents only.

To solve for the margin of error (e) in this example, the formula to be used is

e=
Formula 1a
√N −n .
nN
In the formula, e is the margin of error, N is the population, and n is the sample
size.

Solution:
Given: N = 6000 n = 100

e=
√ N −n
nN
=
√6000 −100
100(6000)
=0.09916∨9.92 %

4.1.1.4 Sampling Techniques

Do you still remember the difference between a population and a sample?


Now let us explore sampling.

Sampling is the process of choosing the samples from a population.

The two kinds of selecting/drawing samples are

1) Probability Sampling (or random sampling) is a sampling technique


where every member of the population has an equal chance of being selected to be
part of the sample. For example, if there are 200 members in the population, each of
these 200 can be part of the survey/study.

2) Non-Probability Sampling is a sampling technique where not all of the


members of the population have a chance to be included in the sample. For
example, a researcher is conducting a survey in a Barangay. He did not include in
his survey those who live far from the main road. Those people were not given a
chance to be part of the survey.

4.1.1.4.1 Probability Sampling

Here are the kinds of probability sampling.


1. Simple Random Sampling. This probability sampling is the simplest
among the types of probability sampling. In this method, numbers are assigned to
the members of the population. Numbers are drawn, and the element of the
population whose number is drawn becomes a part of the sample. This method is
also known as the fishbowl or lottery technique.

Example: A survey is to be conducted among the grade 11 students of a


school. There are 200 grade 11 students. A students' list is prepared, and pieces of
papers numbered 1 to 200 are placed in a container. The researcher draws papers
from this container. Students whose numbers are drawn will become part of the
sample in the survey. The Table of Random Numbers or a number generator can
also be used in drawing the sample from the population.

2. Systematic Random Sampling is a method where every nth or kth


element in the population list is selected in obtaining the sample needed.

Example:
Given: N= 1400 and n = 141
Step 1. Determine k (sampling interval) by dividing the population by the
sample size.

N 1400
k (sampling interval)== =9.93∨10 (This means that every 10 th
n 141
element in the population list will be included in the sample until 141
samples are obtained.)

Step 2. The random start can be determined by choosing from numbers 1,


2, 3, 4, 5, 6, 7, 8, 9, and 10 (k=10). This can be done by lottery,

If number 4 happens to come out, then the random start is 4.

random start: 4 (the 4th member of the population is included in the


sample)

The second number is obtained by adding k to the random start.


4 + k = 4 + 10 = 14
The third number is the 2nd number + k.
14 + k = 14 + 10 = 24

Repeat the procedure until the desired sample size is obtained.

These are the first 20 numbers of the samples: 4, 24,34, 34, 44,54, 64, 74, 84,
94, 104, 114, 124,134, 144, 154, 164, 174, 184, 194

3. Stratified Random Sampling is done by splitting the population into


groups or categories. The samples to be chosen from the groups must be
proportional to the size of the group. This means fewer samples will be taken from
smaller groups, and more samples will be taken from bigger groups.

Example:
Given: N = 4370 patients; n = 151
Male Patients – 2734; Female Patients – 1636

Step 1. Divide the sample by the population.


n 151
p( proportional allocation) = = =.034554
N 4370

Step 2. Multiply the result obtained in step 1 (p) by the size of each group to
get the number of samples to be taken from that group.

Category No. of patients p Sample Size (n)


Male 2734 .034554 94
Female 1636 .034554 57
Total 4370 151

4. Cluster Sampling is used to randomly select the samples from a


population spread out over a wide geographical area. The cluster is used as a
sampling unit, meaning that all the individuals in that cluster will be included in the
sample. This is sometimes called area sampling.

A cluster is a group where the objects or individuals in the group are more
similar to each other as compared to those from other groups.

Example: A sample of 100 health workers in Malolos will be chosen as


respondents in a study. The researcher may consider the barangays as
clusters. He can select the clusters randomly. After he has chosen the
barangays, he can now include all the health workers belonging to the
chosen barangays as part of the sample.

5. Multi-Stage Sampling is done by utilizing a combination of sampling


techniques. This is used when drawing samples from a huge population.

Example: A survey is to be conducted on the opinion of the beneficiaries of


the Social Amelioration Program (SAP) in the province of Bulacan.
Randomly select/draw the municipalities in the province. Then randomly
choose the barangays in the chosen municipalities. Lastly, randomly
choose the beneficiaries in the chosen barangays.

4.1.1.4.2 Non-probability Sampling

The non-probability sampling has three kinds.

1. Quota Sampling is a sampling technique where the population is


divided into categories, just like in stratified sampling. But there is no required
sampling frame that must be used in determining the samples of the study.

Example: A researcher decided to interview 50 audiences during the concert


of a K-pop group at the Philippine Arena. Since he already knows that
most of the viewers are female, he decided to have 80% female
respondents and 20% male respondents. So he approached and
interviewed 40 female audiences and ten (10) male audiences.

2. Purposive Sampling is done by selecting a sample based on the


purpose or needs of the study.

Example: A researcher would like to know why grade 10 students preferred


to transfer to public schools this school year. For this purpose, he will
interview only grade 10 transferees.

3. Incidental/Accidental/Convenience Sampling. This sampling


technique is often used in market research. People that are easy to reach to get a
quick response are the ones who are chosen to become samples in the study.

Example: A researcher wanted to now the acceptance of consumers of a


brand of disinfectant. He will possibly interview his friends, relatives,
neighbors, or go to public places nearby and conduct an interview.
Lesson 4.2 Organizing Data

Objectives of the Lesson


At the end of the lesson, you are expected to:
1. differentiate ungrouped data from grouped data
2. organize ungrouped data using an array, a stem-leaf plot, or a frequency
distribution table
3. construct a frequency distribution table for grouped data correctly
4. calculate the relative frequencies, cumulative frequencies, and cumulative
percentage frequencies of a given data set
5. interpret the values found in the frequency distribution table

You have learned the different ways on how to gather data and the sampling
techniques from which you can choose the one that you will employ in your research.
Now is the time for you to know what to do with the data that you have gathered. It is
essential to organize your data so that you can easily interpret them.

Data may be ungrouped or grouped. Ungrouped data are unsorted or raw data.
This means that the data have not been grouped or classified according to any
characteristic. On the other hand, grouped data are data that have been organized
or grouped.

Lesson 4.2.1 Ways of Organizing Ungrouped Data

These are the ways of organizing ungrouped data.

1. By forming an array
An array is an arrangement of numbers in increasing or decreasing
order.

2. By constructing a stem-leaf plot


A stem-leaf plot is a way of organizing data where the data is split into
two parts: the stem is consists of the hundreds or tens digit while the
leaves are the unit digits.

3. By constructing a frequency distribution table


An ungrouped frequency distribution table is a table showing the data
and its frequency.

The three ways of organizing a set of ungrouped data are shown using the
example below.

The following are the ages of 25 employees in a supermarket.

42, 51, 44, 28, 32, 24, 30, 25, 24, 35, 43, 37, 28,

28, 22, 45, 29, 28, 36, 35, 50, 25, 25, 46, 44
You can organize the data in the following ways:

Array: (from youngest to eldest)


22, 24, 24, 25, 25, 25, 28, 28, 28, 28, 29, 30, 32,
35, 35, 36, 37, 42, 43, 44, 44, 45, 46, 50, 51

Stem-leaf Plot: (Assuming that we did not form an array, let us refer to the original
data.)

42, 51, 44, 28, 32, 24, 30, 25, 24, 35, 43, 37, 28,
28, 22, 45, 29, 28, 36, 35, 50, 25, 25, 46, 44

In the first value, 42, the digit 4 is the stem, and the digit 2 is the leaf. In the
second value, 51, the digit 5 is the stem, and the digit 1 is the leaf. Continue plotting
all the ages of the employees. After all the ages have been plotted, make another
table and arrange the leaves in increasing order.

Draft: (as the data is given) Final: (after arranging the leaves from
The lowest to the highest)

STEMS LEAVES STEM LEAVES


S
2 8, 4,5, 4, 8, 8, 2, 9, 8, 5, 5 2 2, 4, 4, 5, 5, 5, 8, 8, 8, 8, 9
3 2,0, 5, 7,6, 5, 3 0, 2, 5, 5, 6, 7
4 2, 4, 3,5, 4, 4 4 2, 3, 4, 4, 5, 6
5 1, 0 5 1, 0

Ungrouped frequency distribution table

Table No. 1
Ages of 25 Employees in a Supermarket
Ages Frequency
22 1
24 2
25 3
28 4
29 1
30 1
32 1 Note: You may first form an
35 2 array or a stem-leaf plot so
36 1 that it would be easier to
37 1 construct the frequency
42 1 distribution table.
43 1
44 2
45 1
46 1
50 1
51 1
25
Interpretation of the data may be made in this way.

The table shows that of the twenty-five employees of the supermarket, the
youngest is twenty-two years old while the oldest is fifty-one years old. Most of the
employees are in their twenties, six are in their thirties and their forties, and only two
are in their fifties.

Lesson 4.2.2 Frequency Distribution for Grouped Data


Let us now learn how to organize grouped data. A frequency distribution table
is used to organize this kind of data. The data are sorted into groups or classes. The
frequency distribution table shows the number of occurrences of the data in the
different classes.

Example of a frequency distribution table:

Table No 2
Scores of a Sample of 40 Students in a Biology Test

Class Number Class


Class
Intervals of Boundaries
Marks
(Scores) Students Always write the table number
(X i)
LL-UL (f) LB - UB and the table title so that the
17-21 2 16.5 - 21.5 19 reader would know what the
22-26 5 21.5 - 26.5 24 data is about. The class size
27-31 8 26.5 – 31.5 29 and the total number of
32-36 12 31.5 – 36.5 34 frequencies must also be
37-41 7 36.5 – 41.5 39 written after the last class
42-46 5 41.5 – 46.5 44 interval.
47-51 1 46.5 – 51.5 49
c=5 n=40 You have to remember the
definition of the following terms that
are found in the frequency distribution table:

Class interval refers to the grouping bounded by the lower limit (LL) and upper limit
(UL).
Class size (c) is the length or width of the class.
Class frequency (f) is the number of observations falling within a class interval.
Class boundaries refer to the true boundaries (true limits) of a class interval
Class
Intervals In this example, 17-21 is the first class interval
(Scores) where 17 is the lower limit and 21 is the upper
limit. The lower limit of the first class interval is
17-21
usually the lowest value in the data. The upper
17+5 22-26 21+5
limit 21 was obtained by counting 5 units (since
22+5 27-31 26+5 c=5) starting from the lower limit 17 (17, 18, 19, 20,
27+5 32-36 31+5 21). To get the succeeding lower limits, just add 5
32+5 37-41 36+5 which is the class size. Do the same for the upper
37+5 42-46 41+5 limits.
42+5 47-51 46+5
c=5

The table shows that there were 2 students


Class Number Class who scored between 17 and 21 while 5
Intervals of Boundaries students got scores between 22 and 26.
(Scores) Students (LCB –
(f) UCB) The lower class boundaries are obtained
17-21 2 16.5 - 21.5 * by subtracting 0.5 from the lower limits.
22-26 5 21.5 - 26.5 The upper class boundaries are obtained
Class mark or by adding 0.5 to the upper limits.
class midpoint *17-0.5=16.5
refers to the
*21+0.5=21.5
representative of
the class interval.

lower limit =upperlimit 17+ 21


Class mark = =
2 2
= 19

Note: If the class size is an odd number, the class


Class Number
mark is the middle value (17, 18, 19, 20, 21). If the
Intervals of Class Mark
class size is an even number, the class mark is the
(Scores) Students Xi
average of the two middle values. For example, for a
(f) class interval of 5-10, the class mark is (7+8)/2 which
17-21 2 19 is 7.5.
22-26 5 24
5, 6, 7, 8, 9, 10

Another way to get the succeeding class marks:


After getting the class mark of the first class interval,
just add to it the valueofc. For example, 19 + 5 = 24
4.2.2.1 Constructing a Frequency Distribution Table

You have to follow the steps to construct a frequency distribution table. To


show you these steps, let us consider the test scores of 50 students in
Statisticsrecorded as follows:
Table 1
Test Score of 50 Students in Statistics
48 39 55 65 51
79 63 89 29 54
65 58 64 76 90
30 84 50 55 59
69 43 79 44 40
49 50 24 78 71
63 64 73 35 65
58 36 47 86 46
85 74 64 72 54
38 52 33 53 42

Step 1: Determine the Range (R) of the distribution. The range is equal to the
highest score minus the lowest score.

Range (R) = Highest Score - Lowest Score Formula 2

R = 90 - 24
R = 66

Step 2: Determine the class size by dividing the range by the desired number of
classes. (The number of classes must not be too few nor too many. Too
many class intervals may result in classes with zero frequencies.) Let us
have ten classes on this problem. In some cases, the class size is already
given.

Range
Class size or class width( c ) =
numbe rofclasses (if the class size is not
(Formula 3) exact, round it off to the
66 nearest whole number)
c= = 6.6 ≈ 7
10

Step 3: Unless otherwise specified, always start the lowest class limit by the lowest
value of the given data (raw data). For the second lower limit, just add the
class size and then continue to add the class size to this lower limit to get the
rest of the lower limits. To get the first upper limit, subtract one (1) from the
second lower limit. For the second upper limit, just add the class size
continue to add the class size to this upper limit to get the rest of the upper
limits.

Note: The last class interval should contain the highest value.
Constructing the Class Limits Resulting Class Limits/Class
Intervals
LL - UL

Lower Limits (LL) Upper Limits (UL) Class


Intervals

2nd lower limit


lowest score 24 minus 1 30 24 - 30
(31 - 1)
LL + class size UL + class size
24 + 7 = (2nd lower 30 + 7 31 - 37
31 37
limit)
10
31 + 7 38 37 + 7 44 38 - 44
38 + 7 45 44 + 7 51 45 - 51 classes
45 + 7 52 51 + 7 58 52 - 58
52 + 7 59 58 + 7 65 59 - 65
59 + 7 66 65 + 7 72 66 - 72
66 + 7 73 72 + 7 79 73 - 79
73 + 7 80 79 + 7 86 80 - 86
80 + 7 87 87 + 7 93 87 - 93

Step 4: Determine the class boundaries by subtracting 0.5 from each of the lower
class limits and adding 0.5 to each of the upper class limits.

Constructing the Class Boundaries Resulting Class Boundaries

Lower Upper
Class Boundaries
Lower Limit Boundaries Upper Limit Boundaries
LB - UB
- 0.5 (LB) + 0.5 (UB)
24 - 0.5 23.5 30 + 0.5 30.5 23.5 – 30.5
31 - 0.5 30.5 37 + 0.5 37.5 30.5 – 37.5
38 - 0.5 37.5 44 + 0.5 44.5 37.5 – 44.5
45 - 0.5 44.5 51 + 0.5 51.5 44.5 – 51.5
52 - 0.5 51.5 58 + 0.5 58.5 51.5 – 58.5
59 - 0.5 58.5 65 + 0.5 65.5 58.5 – 65.5
66 - 0.5 65.5 72 + 0.5 72.5 65.5 – 72.5
73 - 0.5 72.5 79 + 0.5 79.5 72.5 – 79.5
80 - 0.5 79.5 86 + 0.5 86.5 79.5 – 86.5
87 - 0.5 86.5 93 + 0.5 93.5 86.5 – 93.5

Step 5: Calculate the class marks or class midpoints. It is the numerical location
of the center of the class and is computed as follows:

lowerlimit ( ¿ ) +upperlimit (UL)


Class mark or class midpoint(Xi ) = Formula 4
2
Class Midpoint Class
LL + UL LL + UL
Xi Midpoint
2 2
Xi
24+30 24+30
27 27
2 2
31+ 37 shortcut: 1st midpoint +
34 class size 34
2 27 + 7
38+44
41 34 + 7 41
2
45+51
48 OR 41 + 7 48
2
52+58
55 48 + 7 55
2
59+65
62 55 + 7 62
2
66+72
69 62 + 7 69
2
73+79
76 69 + 7 76
2
80+86
83 76 + 7 83
2
87+93
90 83 + 7 90
2
Step 6: Tally the data. Write the numerical equivalent of the tally on the column for
frequency. The best example of tallying the scores is counting the votes cast
in an election.
From table 1
Class Tally
Intervals (Test Scores)
LL- UL Tally f
First 2nd 3rd 4th 5th
Column Column Column Column Column
l l l
24 - 30 III 3
(30) (24) (29)
l l l
31 - 37 III 3
(36) (33) (35)
l lI l II IIII-I
38 - 44 6
(38) (39),(43) (44) (40), (42)
IIII-II
lI l ll II
45 - 51 7
(48),(49) (50) (50),(47) (51), (46)
l ll l II II IIII-III
52 - 58 8
(58) (52),(58) (55) (55), (53) (54), (54)
ll lI lI I II IIII-IIII
59–65 9
(65, 63) (63),(64) (64),(64) (65) (59), (65)
l l I
66 - 72 III 3
(69) (72) (71)
l I lI Il IIII-I
73 - 79 6
(79) (74) (79),(73) (76), (78)
l I I
80 - 86 III 3
(85) (84) (86)
I I
87 - 93 ll 2
(89) (90)

Applying the steps, table 2 shows how the frequency distribution table looks like.

Table No. 2
Frequency Distribution of the 50 Test Scores in Statistics
Class Class Class
Frequency
Intervals Boundaries Marks
f
LL - UL LB – UB Xi
24 - 30 23.5 - 30.5 27 3
31 – 37 30.5 - 37.5 34 3
38 – 44 37.5 - 44.5 41 6
45 – 51 44.5 - 51.5 48 7
52 – 58 51.5 - 58.5 55 8
59 – 65 58.5 - 65.5 62 9
66 – 72 65.5 - 72.5 69 3
73 – 79 72.5 - 79.5 76 6
80 – 86 79.5 - 86.5 83 3
87 - 93 86.5 - 93.5 90 2
c=7 n = 50
Start making the Worksheet on Data Management. Read the instructions
carefully and do Direction1.

Now let us continue with the cumulative frequency distribution.

4.2.2.2 Cumulative Frequency Distribution

A cumulative frequency distribution can be constructed from a frequency


distribution by adding a column called "Cumulative Frequency." A cumulative
frequency refers to subtotals obtained from the successive additions of the
frequencies. This may be done in two ways:

1. The less than cumulative frequencies (<cf) refer to the frequencies


added successively from the lowest class interval to the highest class interval.

2. The greater than cumulative frequencies (>cf) refer to the frequencies


added successively from the highest class interval to the lowest class interval.

Illustration 1
(Data from Test Scores of 50 Students in Statistics)

Resulting "Less
Successive addition of Successive addition of than" and
frequencies from top to frequencies from "Greater than"
bottom bottom to top Cumulative
Frequencies
Greater Cumulative
Less Than Frequency
than
Frequency Cumulative Frequency
Cumulative f
f Frequency f <cf >cf
Frequency
(<cf)
(>cf)
3 3 3 47 + 3 50 3 3 50
3 3+3 6 3 44 + 3 47 3 6 47
6 6+6 12 6 38 + 6 44 6 12 44
7 12 + 7 19 7 31 + 7 38 7 19 38
8 19 + 8 27 8 23 + 8 31 8 27 31
9 27 + 9 36 9 15 + 9 23 9 36 23
3 36 + 3 39 3 11 + 3 14 3 39 14
6 39 + 6 45 6 5 + 6 11 6 45 11
3 45 + 3 48 3 2+3 5 3 48 5
2 48 + 2 50 2 2 2 50 2
n = 50 n = 50 n=50
Then let us proceed to make the table on cumulative percentage frequency.

Cumulative Percentage Frequency

Cumulative percentage frequency is another column in the frequency


distribution. It is obtained by dividing the cumulative frequency by the total number of
cases (n) then multiplying by 100. It shows the percentage of students falling below
or above (<cpf or >cpf) individual scores. The formula is:

CumulativePercentage Frequency (cpf) cf x 100


=

cf Formula 5
Cumulative Percentage Frequency (cpf) = x 100
n

This is how to use the formula.


Illustration 2

Cumulative Percentage
Frequency Cumulative Frequency
Frequency
f
<cf >cf <cpf >cpf
3 50
3 3 50 x 100=6 x 100=100
50 50
6 48
3 6 48 x 100=12 x 100=96
50 50

Now our table looks like this with the addition of the column on cumulative
percentage frequency.
Table 3
Cumulative Percentage Distribution of 50 Test Scores in Statistics

Cumulative
Class Class Frequenc Cumulative
Percentage
Intervals Boundaries y Frequency
Frequency
LL - UL LB - UB f
<cf >cf <cpf >cpf
24 - 30 23.5 - 30.5 3 3 50 6 100
31 – 37 30.5 - 37.5 3 6 48 12 94
38 – 44 37.5 - 44.5 6 12 44 24 88
45 – 51 44.5 - 51.5 7 19 38 38 76
52 – 58 51.5 - 58.5 8 27 31 54 62
59 – 65 58.5 - 65.5 9 36 23 72 46
66 – 72 65.5 - 72.5 3 39 14 78 28
73 – 79 72.5 - 79.5 6 45 11 90 22
80 – 86 79.5 - 86.5 3 48 5 96 10
87 - 93 86.5 - 93.5 2 50 2 100 4
c=7 n= 50
This is how to interpret the cumulative frequency and the cumulative
percentage frequency.

Remember: Use the upper class boundaries in interpreting the <cf and
the <cpf. (lower than the upper class boundaries)
Use the lower class boundaries in interpreting the >cf and the
>cpf. (higher than the lower class boundaries)

The following is an example of an interpretation for the scores of the 50 students in


Statistics.

(For less than cumulative frequency and less than cumulative percentage frequency,
use the numbers colored yellow in the table.)

As seen in table 3, of the 50 students who took the test in Statistics,


three (3) students scored lower than 30.5 while six (6) students scored lower
than 37.5. Twenty-seven (27) or more than half of them, scored lower than 58.5.
Moreover, 24% of the students scored lower than 44.5, while 78% scored lower
than 72.5.

(For greater than cumulative frequency and greater than cumulative percentage
frequency, use the numbers colored green in the table.)

Of the 50 students who took the test in Statistics, thirty-eight (38)


students got scores higher than 44.5, while only two (2) students got scores
higher than 86.5. On the other hand, 62% of them scored higher than 51.5,
while only 10% got higher than 79.5.

Let us add one more column, this time on relative frequency.

4.2.2.3 Relative Frequency Distribution

A relative frequency distribution can be created from a frequency distribution


by adding a column called "Relative Frequency." A relative frequency is the ratio of
the individual frequency per class to the total number of cases (n) then multiplied by
100%. The formula is:

f Formula 6
Relative Frequency (rf) = x 100 %
n
Resulting
Relative Frequency
Illustration No.3 Distribution of 50 Test
Scores in Statistics
Relative Relative
Frequency Frequency
Frequency Frequency
f f
(rf) (rf)
3 3 6
x 100 %
50 3
6
3 3 6
x 100 %
50 3
6
6 6 12
x 100 %
50 6
12
7 14
8 16
9 18
3 6
6 12
3 6
2 4
n = 50 100%

Table 4 shows the addition of the column on Relative Frequency (RF).

Table 4
Cumulative Percentage and Relative Frequency Distribution of 50 Test Scores
in Statistics

Class Cumulative
Cumulative
Class Percentage Relative
Boundarie Frequency Frequency
Intervals f Frequency Frequency
s (RF)
LL – UL <cf >cf <cpf >cpf
LB - UB
24 – 30 23.5 - 30.5 3 3 50 6 100 6
31 – 37 30.5 - 37.5 3 6 48 12 94 6
38 – 44 37.5 - 44.5 6 12 44 24 88 12
45 – 51 44.5 - 51.5 7 19 38 38 76 14
52 – 58 51.5 - 58.5 8 27 31 54 62 16
59 – 65 58.5 - 65.5 9 36 23 72 46 18
66 – 72 65.5 - 72.5 3 39 14 78 28 6
73 – 79 72.5 - 79.5 6 45 11 90 22 12
80 – 86 79.5 - 86.5 3 48 5 96 10 6
87 – 93 86.5 - 93.5 2 50 2 100 4 4
c=7 n = 50 100%

This is how to interpret relative frequency.


As shown in the following example, of the fifty (50) students who took
the test in statistics, 6% got scores between 24 and 30, and another 6% scored
between 31 and 37. Most of the students, or 18%, scored between 59 and 65
while 16% got scores between 52 and 58.

You might also like