You are on page 1of 18

Module 4

Data Organization
and Presentation
Objectives:

At the end of the lesson, the students are expected to:


1. prepare a stem-and-leaf plot;
2. construct frequency distribution table;
3. create graphs for qualitative and quantitative data;
4. read and interpret graphs and tables; and
5. perform simple analysis of data.

Introduction

In every research activity, information gathered may result in large masses of data.
These selected data need to be organized and presented in same manner that it could be easily
understood. Data sets are usually organized in tables and displayed through graphs.

TABULAR PRESENTATION OF QUALITATIVE DATA

Suppose you asked a sample of 20 persons about “where in the Philippines would
they like to spend their summer vacation.” The responses of these students were recorded and
results are as follows:
Boracay Baguio Palawan Bohol Boracay
CamSur Bohol Baguio Palawan Bohol
Palawan Bohol CamSur Palawan Boracay
Boracay Palawan CamSur Bohol Palawan

We may construct a frequency distribution table for these data. Note that the
variable in our activity, “It’s more fun in the Philippines”, is the different tourist
destinations, and is qualitative in nature. To construct a frequency distribution for qualitative
data, we simply list all categories and the number of responses that belong to each of the
categories.

The variable in the activity is classified into five categories; Baguio, Boracay, Bohol,
CamSur and Palawan. These categories are recorded in the first column of the frequency
distribution table. Each of the responses for the given data is read and marks a tally (1) in the
second column. Finally, record the total number of tallies for each category in the third
column of the table called the column of frequencies, usually denoted by f. The sum of the
entries in the frequency column gives the sample size (n) or the total frequency.
The frequency distribution table for the data set on tourist destination is as follows:

Table 1. Frequency Distribution of PreferredTourist Destination


Tourist Destination Tally Frequency (f)
Baguio 2
Bohol 5
Boracay 4
Camsur 3
Palawan 6
n = 20
Additional information may be derived from the data set such as the relative
frequency and percentage distribution. A relative frequency of a category is determined by
getting what fractional part or proportion of the total frequency belongs to the corresponding
category. On the other hand, the percentage for a category is determined by multiplying the

Elemtary Statistics 2
relative frequency of that category by 100.

Elemtary Statistics 3
In calculating relative frequency and percentage distribution, we have,

Relative Frequency:
f where: rf – relative frequency
rf  f – frequency for each category
n
n – total frequency or sample size
Percentage:
Percentage = (rf)x 100

The relative frequency and percentage distribution of data is given below:

Relative Frequency and Percentage Distributions of Preferred Tourist Destination

Tourist Destination Frequency (f) Relative Frequency Percentage


Baguio 2 2/20 = 0.10 10.0
Bohol 5 5/20 = 0.25 25.0
Boracay 4 4/20 = 0.20 20.0
Camsur 3 3/20 = 0.15 15.0
Palawan 6 6/20 = 0.30 30.0
n = 20  = 1.00  = 100

GRAPHICAL PRESENTATION of QUALITATIVE DATA

Data may easily be read if presented or displayed through graphs. Graphs give a
visual representation, thus, allowing to communicate information about the complicated
relationships among statistical data. This helps the readers to grasp information more
effectively.

Some of the graphs that may be used to present qualitative data are:
1. Bar graph
A bar graph uses vertical or horizontal bars to compare sizes of quantities. The
heights of bars represent the frequencies of repetitive categories.

Example: Bar Graph of Preferred Tourist Destination

6
Frequen

0
BAGUIO
BOHO BORACAY CAMSUR PALAWAN
L

Elemtary Statistics 4
2. Pie Graph

A pie graph is used to show the relationship of the parts to a whole. It is displayed by
a circle divided into portions that represent the relative frequencies or percentage of a
population or sample that belongs to different categories.

Example: Pie Graph of Preferred Tourist Destination

Tourist Destination
Baguio 10%

Palawan
30%
Bohol
25%

CamSur
Boracay 15%
20%

To construct a pie graph, we first determine the number of degrees that represent each
fractional part or percent of respective categories. Take note that a circle contains 360
degrees. This means that we have to multiply each percent of the category by 360 degrees to
get the area sector or angle size for the pie chart.
Example:
Tourist Destination (f) rf Angle size/Area sector
Baguio 2 0.10 360(0.10) = 36
Bohol 5 0.25 360(0.25) = 90
Boracay 4 0.20 360(0.20) = 72
Camsur 3 0.15 360(0.15) = 54
Palawan 6 0.30 360(0.30) = 108
n = 20

Elemtary Statistics 5
3. Line Graph
A line graph makes use of line segments to show changes and relationship between
quantities.

Example: Figure 4. Average Age of the Total Population: 1980, 1990, 1995, 2000-2011,
2016, 2017, and 2040

Sources: 1/ Based on the 1980, 1990 and 2000 Census of Population and Housing (CPH) and 1995 Census of Population of NSO.
2/ Special computations made by the NSCB-Technical Staff (NSCB-TS) using the 2000 Census-based Population Projections of NSO.

Take Note: Bar graph and line graph may also be used for comparing quantities of two
or more data sets. Different styles or color for bars and lines may be used to
distinguish a group from each other.

Example: Gross Domestic Product and Gross National Income, at Constant Prices, 2000
to 2011

9000000
8000000
7000000
6000000
5000000
4000000
3000000
GDP
2000000
GNI
1000000
0

2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011

Source of data: National Statistical Coordination Board (NSCB)


GDP- Gross National
Product GNI- Gross National
Income

Elemtary Statistics 6
Dependency Ratio by Type in Percent
Census Years 1970, 1975, 1980, 1990, 1995, 2000, 2007 and 2010

Elemtary Statistics 7
TABULAR PRESENTATIONOF QUANTITATIVE DATA

Data for quantitative variables may likewise be organized by determining the


frequency counts belonging to each group called classes or class intervals. Consequently, we
need to prepare a stem-and leaf display or construct a frequency distribution table, to
effectively present the data.

Suppose a regional-wide survey was conducted to determine its functional literacy


rate. Functional literacy, according to National Statistics Office (NSO), is a higher level of
literacy which includes not only reading and writing skills but also numerical and
comprehension skills. The survey includes 10-64 years old household members of provinces
and key cities in the region. The literacy rate of the sample was determined, and the results
are as follows:

84 78 90 84 95 82 84 75 83 89
88 90 88 91 89 85 98 86 92 93
66 98 81 87 74 89 98 79 84 87
80 89 73 86 82 94 97 94 86 93
93 95 96 97 88 77 96 76 88 92

Literacy rate, a quantitative variable, may be organized using a stem and leaf display
or frequency distribution table.

STEM-and-LEAF DISPLAY

Presenting quantitative data in condensed form using stem-and-leaf display that


contain the individual observation, thus no information is loss. Each value in the type of
presentation is divided into two parts – a stem and leaf. The leaves for each stem are shown
separately in the presentation.

How to Prepare a Stem-and-Leaf Display

1. Split each value into two parts. The first part is the first digit, which is called the
stem. The second part will be the second digit, which is called the leaf.
2. Draw a vertical line and write the stems on the left side of it arranged in ascending
order.
3. After listing the stems, read the leaves for all values and record them next to the
corresponding stems on the right side of the vertical line.

Example: For the given data, the first two values are 84 and 78, thus:

stems 7 8 leaf for 78


8 4 leaf for
84 9

The resulting steam and leaf display of the given data is:

6 6
7 8 5 4 9 3 7 6
8 4 4 2 4 3 9 8 8 9 5 6 17 9 4 7 0 9 6 8 8

Elemtary Statistics 8
9 0 5 0 1 8 23 8 8 4 7 4 3 3 5 6 7 6 2

Elemtary Statistics 9
Elemtary Statistics 10
A frequency distribution for quantitative data lists all the classes and the number of
values belonging to each class. Data presented in this form are called grouped data.

To construct a frequency distribution table for quantitative data, we have the following steps:

1. Find the range of the data set. The range (R) is given by the difference between the highest
(H) and lowest (L) data entries. So, for our given data set we have:
R = H – L = 98 – 66 = 32
2. Determine the number of classes, also known as number of class intervals (c). Note that
these classes represent a variable. One rule to help us decide on the number of classes is to
use Sturge’s Formula, given by;
c = 1 + 3.322 log n

where: c – number of classes


n – sample size/ total frequency

Therefore: c = 1 + 3.322 log 50 → c7

3. Find the class size (i), also known as class width of the data set. Divide the range by the
number of classes (c) and round up to find the class size of the data set. Thus, we have
i=R/C

where: i = class size


i = 32 / 7 R = range
i = 4.71 c = number of classes
i=5
4. List the class intervals of the data set for the given data, we will have to construct seven (7)
classes with a class size (i) of 5. Determine also the lower limits and the upper limits of the
classes.
a. The lower limit of the first class’ interval is a number nearest to the lowest value of
the data entries that is divisible by the class size. This value may be less than or
equal to the lowest value.
For the given data, lowest value is 66. The nearest number to 66 that is divisible by
5 is 65 which is the lower limit of the first class’ interval. To find the lower limit of
the remaining 6 classes, add the class size to the lower limit of each previous class.
b. The upper limit of the first class’ interval is a number that is one less than the lower
limit of the second class. The upper limits of the remaining five classes is
determined by adding the class size to the upper limits of each previous class.

5. Tally the entries from each class interval.

Elemtary Statistics 11
6. The number of tally marks for a class interval is the frequency for that class. The frequency
distribution for the given data is shown below.

Table 2. Literacy Rates of Provinces and Key Cities of Region X


Literacy Rate Tally Frequency (f)
Variable
65 – 69 1
70 – 74 2
Third class
75 – 79 5 Frequency of the
80 – 84 9
85 –89 14 fourth class
Lower limit of 90 – 94 10
the sixth class 95 - 99 9
n = 50 Number of cases
or sample size
Upper limit of the fifth class

After constructing a frequency distribution such as above, there are several additional
features that we may include to help better understand the data.

1. Classmark (xm)
The classmark (xm), sometimes called midpoint of the class interval is the sum of the
lower and upper limits of the class interval divided by two.
Thus,
𝑥𝑚 𝑈𝐿+𝐿𝐿
= 2

where: xm - class mark


UL - upper class limit
LL - lower class limit

2. Class Boundaries
The class boundary is given by the midpoint of the upper limit of one class and the
lower limit of the next class. The class boundaries are the real limits of the class intervals.

Given below are the classmark and class boundaries of our data in Table 2

Classmark and Class Boundaries of a Frequency Distribution Table of Functional Literacy


Rate Region X

Literacy Rate Frequency (f)


Classmark ClassmarkLowerclass
(xm) Class Boundaries
boundary Upper class
65 – 69 1 67 64.5
th – 69.5
of the 5 class boundary of
70 – 74 2of the 72 69.5 – 74.5 the 2nd class
75 – 79 5 77 74.5 – 79.5
80 – 84 9 82 79.5 – 84.5
85 – 89 14 87 84.5 – 89.5
90 – 94 10 92 89.5 – 94.5
95 - 99 9 97 94.5 –99.5

Elemtary Statistics 12
Take Note: We may distort or lose some information when we grouped into classes the
raw data. It is advised that we construct the frequency distribution table
carefully.

3. Relative Frequency (rf)


The relative frequency (rf) of the class interval is the portion or part of the data that
falls in that class. To find the rf, we have:

𝑟𝑓 𝑓
=𝑛

where: rf – relative frequency


f – frequency of the given class
n – total number of cases or sample size

4. Cumulative Frequency (cf)

The cumulative frequency of a class interval is the sum of the frequency for the given
class and all previous classes. Cumulating the frequencies may be done by adding each
frequency starting from the lowest class interval, thus less than cumulative frequency (<c f). It
may also start from the highest class’ interval, thus greater than cumulative frequency (>cf).

5. Percentage

The percentage distribution of a class intervals, list the percentage of each class
obtained by multiplying the relative frequency of the class intervals by 100.
Percentage = (relative frequency * 100)

6. Cumulative Percentage Frequency (cpf)


The cumulative percentage frequency of a class interval is the sum of the percentage
for the given class and all previous classes. This may be done in two ways; as with the
cumulative frequency, in which we add the percentage frequency either from the lowest class
interval or from the highest class’ interval.

The relative frequency, cumulative frequency, percentage and cumulative percentage


frequency of table 2 is given here:

Relative Frequency and Percentage Distribution Table


Of Functional Literacy Rate of Region X
Literacy Frequency Relative Percentage
Rate (f) Frequency (rf) (%)
65 – 69 1 .02 2
70 – 74 2 .04 4
75 – 79 5 .1 10
80 – 84 9 .18 18
85 – 89 14 .28 28
90 – 94 10 .2 20
95 - 99 9 0.18 18
n = 50 1.00 100

Elemtary Statistics 13
GRAPHICAL PRESENTATION OF QUANTITATIVE DATA

Pictures convey the message more effectively rather than column of numbers. It is
easier to identify patterns of data set by through visual presentation of a frequency table.
Visual models, such as graphs, provide a better understanding of a data set.
Recall that for qualitative data, we may present the data set using bar graph, line
graph, pictograph or pie graph. To show the information obtained from a frequency table of
quantitative data, we may use histogram and frequency polygon.

Histogram

A histogram is a bar graph that represents the frequency distribution of a “continuous”


data set. It has the following properties:
1. The horizontal scale is quantitative and measures of the data set.
2. The vertical scale measures the frequency of the class interval.
3. There is “no gap” between consecutive bars.

Steps in Constructing Histogram


1. Mark the horizontal axis with the classmarks of the class intervals and the
vertical axis with the frequencies.
2. Draw a bar graph for each class, such that the classmark is at the center of the
bars, and its height represents the frequency of that class.
3. Draw the bars adjacent to each other with no gap between bars. The resulting
bar graph is then called a frequency histogram, or simply histogram.

Take Note: There are variants of histogram such as relative frequency histogram
or percentage histogram. The difference depends on whether the
relative frequencies or percentages are marked on the vertical axis.

Example: The frequency histogram of the data in Table 1 is shown below.

14
12

10
8
6
4

67 72 77 82 87 92 97

Elemtary Statistics 14
Polygon
Another way of presenting quantitative data in graphical form is by constructing
polygons. This graph is formed by joining the midpoints of the tops of successive bars in a
histogram with straight lines. It emphasizes the continuous change in frequencies.

Steps in Constructing Polygon


1. Mark the horizontal axis with the classmark of the class interval and the vertical axis
with the frequencies.
2. Mark a dot above the midpoint of each class interval at a height equal to the frequency
of that class interval.
3. Mark two more classes one at each end and mark this midpoints. Take note that these
two classes have zero frequencies.
4. Join the adjacent dots with straight lines. The resulting line graph is then called
a frequency polygon, or simply polygon.

Take Note: Variants of polygon are frequency polygons with frequency marked on the
vertical axis, the relative frequency polygon where relative frequencies are
marked on the vertical axis. Consequently, a percentage polygon has percentages
marked on the vertical axis.

14
12
10
8
6
4

67 72 77 82 87 92 97

Cumulative Frequency Polygon or Ogive

A cumulative frequency polygon, or ogive (pronounced ō’jive) is a polygon that


presents the cumulative frequency of each class at its class boundaries.

Types of ogives

1. Less than ogive – the upper class’ boundaries are marked on the horizontal axis
and the less than cumulative frequencies are marked on the vertical axis.
2. Greater than ogive – the lower class’ boundaries are marked on the horizontal
axis and the greater than cumulative frequencies are marked on the vertical axis.

Elemtary Statistics 15
How to construct an ogive
1. Construct a cumulative frequency distribution.
2. Specify the horizontal and vertical scales of the graph. The horizontal axis consists
of the class boundaries and the vertical axis with cumulative frequencies.
3. Plot the points that represent the specified class boundaries and their
corresponding cumulative frequencies.
4. Connect the points on the graph.
5. Close each graph with broken lines on both ends.

56
49
42

35
28
21
14

64.5 69.5 74.5 79.5 84.5 89.5 94.5 99.5

56
49
42

35
28
21
14

64.5 69.5 74.5 79.5 84.5 89.5 94.5 99.5

Elemtary Statistics 16
Elemtary Statistics 17
REFERENCES

Sirug, W. S. (2018), Introduction to Business Statistics


Blay, B. E. (2013), Elementary Statistics
https://www.khanacademy.org>math

Elemtary Statistics 251

You might also like