You are on page 1of 15

Discussions and Details for February 19, 2021

DATA ORGANIZATION
Frequency Distribution

I. Categorical Frequency Distribution

II. For Grouped Frequency Distribution - for numeric data (interval or ratio scale)

- Organizing values of Quantitative Variables (numeric values) in tabular form

Example of a Grouped Frequency Distribution:

Scores of students in an examination

CLASS FREQUENCY
(score in exam)

1st 0 14 3
2nd 15 29 5
3rd 30 44 8
4th 45 59 12
5th 60 74 10
6th 75 89 7
7th 90 104 5

Solving the class size (C):


Using 1st class (one class only) : C = UL – LL + 1 = 14 – 0 + 1 = 15

Using 3rd and 4th classes (two consecutive classes):


In terms of LL: C = LLnext - LLprevious = 45 – 30 = 15
In terms of UL: C = ULnext - ULprevious = 59 – 44 = 15

NOTE: Class Size is equal or uniform for all classes.

Solving for the Range:


Range = class midpt of 1st class - class midpt of last = 97 – 7 = 90

Features of Grouped Frequency Distribution:


1.) class or class interval – specific range of values whose frequency is obtained.

2.) Class limits – values included in a given class and include


a.) lower class limit (LL) – lowest value in a given class.
b.) Upper class limit (UL) – highest value in a given class.

3.) Class size/width/length (C) - number of units of numeric value in a given class.
Solving for Class Size (C):
Using the Grouped Frequency Distribution
If only one class interval is used:
a.) C = UL – LL + 1
If two consecutive class intervals are used:
2
b.) C = LL of next class – LL of previous class immediately before

Alternatively:
LL of next class = LL of previous class + C
c.) C = UL of next class – UL of previous class immediately before
Alternatively:
UL of next class = UL of previous class + C

Additional Application for Class Size:


Ex: Formulate or set-up two consecutive classes such that the LL of the first class is 10 and the class size
is 13.

Soln:
CLASS
LL UL
1st 10
2nd

Using the Raw Data:


d.) C = Range / desired number of classes (if raw data)
where: Range = Highest value – Lowest value

4.) Class Midpoint (M) - middlemost value in a given class


Where: M = (UL + LL) / 2

5.) Range
Range = Highest Value – Lowest Value (for raw data)

Range = Class midpoint of the last class – Class midpoint of the first class (for grouped frequency distribution)

6.) Class boundaries – range of numerical values that separate the classes so that there are no gaps in the
frequency distribution.
- include
a.) Lower class boundary (LB) = class boundary before the LL
b.) Upper class boundary (UB) = class boundary after the UL
(Basic Rule: The class limits should have the same decimal place value as the collect data, but the class
boundaries should have one additional place value and end with 5)

Additional Application for Class Boundaries:


Ex: Specify the LB and UB for the following classes/class intervals:
LL UL LL UL
a.) 2.47 11.29 b.) 0.013 0.176
3
Additional Examples of Grouped Frequency Distribution:
1. Using LL and UL for classes

2. Using LB and UB

Construction of Numeric Grouped Frequency Distribution:

Guidelines for classes

1. There should be between 5 and 20 classes.


2. The class width/size (C) should be an odd number. This will guarantee that the class midpoints are
integers (whole numbers) instead of decimals.
3. The classes must be mutually exclusive. This means that no data value can fall into two different classes
4. The classes must be all inclusive or exhaustive. This means that all data values must be included.
5. The classes must be continuous. There are no gaps in a frequency distribution. Classes that have no
values in them must be included (unless it's the first or last class which are dropped).
6. The classes must be equal in width/size. The exception here is the first or last class. It is possible to have
an "below ..." or "... and above" class. This is often used with ages.

Steps in Creating a Grouped Frequency Distribution

1. Find the largest/highest (maximum) = H and smallest/lowest(minimum) = Lvalues from the raw data
set/file.
2. Compute the Range = Maximum – Minimum; R = H - L
3. Decide or Select the number of classes desired. This is usually between 5 and 20.
4. Find the class width/size by dividing the range by the number of classes and rounding up. There are two
things to be careful of here. You must round up, not off. Normally 3.2 would round to be 3, but in
rounding up, it becomes 4, but if class midpoints are to be whole number, the class size can be rounded-
off to the next higher odd number, like instead of class size of 4 it is rounded-up to 5. If the range
divided by the number of classes gives an integer value (no remainder), then you can either add one to
the number of classes or add one to the class width.

C = Range/No. of classes = H – L / no. of classes = value rounded-up to next higher odd number
4
Ex: Given: Raw Data: H = 135, L = 17 , no. of classes = 9
Reqd: Class size/ width, C that is an odd number
Soln:
C = Range/No. of classes = H – L / no. of classes = value rounded-up to next higher odd number

C = (135 – 17) / 9 = 13.11

5. Set-up classes – using lowest value in the raw data file as the LL of the first or lowest class.
6. Tally the data.
7. Find the frequencies.

Additional tasks may include:

8. Find the cumulative frequencies. Depending on what you're trying to accomplish, it may not be
necessary to find the cumulative frequencies.
9. If necessary, find the relative frequencies and/or relative cumulative frequencies.

Illustration: Constructing a grouped frequency distribution

Ex: Number of revolutions per minute (rpm) of gears


Raw Data:
78 115 215 175 64
178 155 177 108 219
286 84 105 249 88
Reqd: Construct a grouped frequency distribution with 5 classes, first class with LL = lowest value in the
data and class size is odd number.
Soln:
Steps 1 and 2: Find H and L value in raw data and solve for R:

Steps 3 and 4: Decide no. of classes(5 classes) and solve for class size, C

Step 5: Set-up classes (LL and UL)


CLASSES TALLY FREQUENCY
(rpm)
LL UL
1st 64 108
nd
2 109 153
3rd 154 198
th
4 199 243
5th 244 288
5
Answer: Grouped Frequency Distribution

CLASSES FREQUENCY
(rpm)
LL UL
64 108 6
109 153 1
154 198 4
199 243 2
244 288 2

Kinds of Frequencies:
1.) Frequency, f = based on the result of usual tally of raw data.
2.) “less than” cumulative frequency, <f = frequency up to but not exceeding the upper boundary of a given
class interval.
3.) “greater than” cumulative frequency, >f = sum of the frequencies more than the lower boundary of a
given class.
4.) relative frequency, Rel f = frequency of a given class divided the total frequency.
5.) Percentage relative frequency, % Rel f = relative frequency multiplied by 100 (expressed in % value)

Illustration: Solving all types of Frequencies

CLASSES
(rpm)
LL UL
64 108 6
109 153 1
154 198 4
199 243 2
244 288 2

III. Alternative Approach: Data Organization through its distribution Using Stem-Leaf Plot

Stem-Leaf Plot - a method of organizing data and is a combination of sorting and graphing.
- is a data plot that uses part of the data value as the stem and the part of the
data value as the leaf to form groups or classes.
- see pages 80 – 83 Bluman pdf
Where:
Stem = refers to highest place value of a number
Leaf or leaves = refers to the remaining place value or trailing values

Illustration: Specify the stem and leaf/leaves of the following numbers


Ex: a.) 27 b.) 349
6
Construction of a Stem-Leaf Plot
1.) Arrange the raw data in order according to magnitude.
2.) Separate the data according to their digits. (see illustration on the board)
3.) Identify the stem (leading digit for each value) and the leaf ( the trailing digits in each value).
4.) Construct the stem-leaf plot

Illustration: Construct the Stem-leaf Plot


Ex: Data on number of trips per day of jeepneys
12 16 23 34 11
20 34 9 20 16
11 15 10 12 26
Reqd: Construct the Stem-Leaf Plot

Data Presentation:
- usually done by representing the data as graphs or charts.
- Converting the organized data (freq. distribution) into graphs

Types of Statistical Graphical Presentation of Frequency Distribution

1) Pie Charts – a circle that is divided into sections or wedges according to the percentage of frequencies in
each category of the distribution.
2) Histogram or Bar Graph - the frequency(y-axis) is plotted against the class boundaries (x-axis) for each
class intervals.

3) Frequency Polygon – the frequency(y-axis) is plotted against the class midpoints (x –axis) of each class.

4) The ogive is a graph that represents the cumulative frequencies for the classes in a frequency
distribution.
4.a) “Less than” Cumulative Frequency Ogive - the “less than” cumulative frequency (y-axis) is
plotted against the upper boundaries (x –axis) of each class.
7
4.b) “Greater than”Cumulative Frequency Ogive – the “greater than” cumulative frequency (y-
axis) is plotted against the lower boundaries (x –axis) of each class.

◼ Example 1: Plotting a Pie Chart


◼ Students in an Adult School were surveyed about the type of transport they use to travel to school.
The results were:
walking 9, train 10, tram 6, car 12, bicycle 3.
◼ Construct a pie chart with this information.

Percentage
Number of students
Transport Angle size for
of preferring
Type Pie Chart
Students transport
type
9/40 x 100 22.5% of 360°
Walking 9
= 22.5% = 81°
10/40x100 25 % of 360° =
Train 10
= 25% 90°
6/40x100 = 15 % of 360° =
Tram 6
15% 54°
12/40x100 30 % of 360° =
Car 12
= 30% 108°
3/40x100 = 7.5 % of 360°
Bicycle 3
7.5% = 27°

◼ Example 2: Plotting a Histogram / Bar Graph


◼ Survey results of the ages of students in the Adult Basic Education math classes are shown in this
frequency table.
Class
(Age in yrs) Class Boundaries (x –axis) Frequency (y-axis)
15-19 14.5 - 19.5 13
20-24 19.5 – 24.5 15
25-29 24.5 – 29.5 20
30-34 29.5 – 34.5 10
35-39 34.5 – 39.5 8
40-44 39.5 – 44.5 4
8
Use this data to produce a frequency histogram.

◼ Example 3: Plotting an Frequency Polygon


◼ Construct a frequency polygon to represent the data for the record high temperatures for each of the 50
states .
◼ Frequency polygons use class midpoints and frequencies of the classes.

◼ Example 4: Plotting an Ogive


◼ Construct an ogive to represent the data for the record high temperatures for each of the 50 states
◼ Ogives use upper class boundaries and cumulative frequencies of the classes.
9

D. Data Summarization/Description/Analysis = Data Analysis:

Summarization of Statistical Data - a method of describing the characteristics of a sample or


population based on the data obtained for a given statistical variable.

Ways of Summarizing or Describing Data or Analyzing Data:


1.) Measures of Central Tendency or Measures of Averages
- describes the center of distribution of the gathered data.
2.) Measures of Variation or Measures of Dispersion
- describes how the data is spread out or scattered away from the center of the
distribution.
3.) Measures of Position or Measures of Rank
- describes where a specific data falls or is located within the data set or its relative
position in comparison with other data values.

NOTE: Measures found by using all the data values in the population are called “parameters” while measures
obtained by using data values of samples are called “statistics”.

The Measures of Central Tendency or Measures of Averages

The central tendency of a distribution is an estimate of the "center" of a distribution of values. These are also
single values that describe the whole data points. There are three major types of estimates of central tendency:

• Mean
• Median
• Mode

1.) Mean or arithmetic average or computational average - applicable only to quantitative data

- is the most common measure of central tendency


- most sensitive to the data distribution
- affected by extremely high or low scores or values in the data(skewed data).
- computed by using all values in the data set.
- assumes a unique value for each data set.
- usually appropriate in describing the center of a data set with normal or bell-
shaped distribution.
- applicable only to variables measured using ratio or interval scale or
numerical values.
10

Standard Symbols used for the mean:

µ = for population mean

̅
𝒙 = for sample mean

The Mean is appropriate for use in the following situation:

a.) When the distribution consists of ratio or interval data which have no extreme values (too high or
too low in comparison with the other values in the data set).

b.) When other statistics or parameters (like standard deviation, coefficient of correlation, etc.) are
subsequently to be computed.

c.) When the distribution is normal or is not greatly skewed, the mean is usually preferred to either
the median or the mode. In such cases, it provides a better estimate of the corresponding
population parameter than either the median or the mode.

Computation of the Mean:


Case 1: Using Ungrouped Data (Raw Data)
∑𝑥
𝑥̅ = = sum of all values in the data set / number of values in the data set
𝑛

∑𝑥
Or: 𝜇= 𝑁

Example: Find the mean of the values: 17, 33, 27, 45, 65

NOTE: Mean of raw or ungrouped data can also be directly computed using a calculator.

Case 2: Using Grouped Data (Frequency Distribution)

Two Methods can be used:


a.) Midpoint Method (long method)
b.) Unit Deviation Method or Coding Method (shortcut method) = to be
covered after the Prelims

Using Midpoint Method:


Steps Involved: a.) Get the midpoint of each class.
b.) Multiply each midpoint by its corresponding frequency M(f)
11
c.) Get sum of the products in step b Σ M(f)
d.) Divide the sum obtained in step c by the total number of
frequencies or sample/population size. The result should be rounded-off to two
decimal places.

Columns for Tabular Computation of Mean using Midpoint Method:


Class f M fM

Formula for solving the Mean by Midpoint Method:

ΣfM Σ M(f)
𝑥̅ = n= Σf or: µ = N = Σf

Using Unit Deviation or Coding Method:


Steps Involved: a.) Arbitrarily choose an “assumed mean”, A, which is the midpoint
usually of the class in the middlemost position of the frequency table
or the midpoint of the class with the highest frequency. The class
containing the assumed mean is the “mean class”.
b.) Solve the value of the coded deviation or unit deviation,U, where
U = (CM – A)/C for each class. A shortcut method to get the coded
deviation, U, is to assign a coded mean of zero starting with the
mean class and the other classes will have coded mean that are
consecutive integers with positive values below or going down from
the mean class and negative values going up from the mean class.
c.) Multiply the frequencies of each class to their corresponding unit
deviation.
d.) The mean is computed as

Σ U(f)
X(bar) = A + n C

Where: A = assumed mean = midpoint of arbitrarily


chosen mean class
U = unit or coded deviation
f = frequency of each class
n = Σf = sample size
C = class size

Columns for Tabular Computation of Mean using Coding Method:


Class f U f U
12
2.) Median or the counting average
- the middlemost value in a given data set.
- the value in the data set that separates the upper 50% and lower 50% of the data.
- commonly used to represent the center or middle value of the data set.
- not sensitive to the presence of extremely high or low values (outliers) in the data set.
- used when one must determine whether the given value falls into the upper half
or lower half of the distribution.
- appropriate for use in both the normal and skewed distribution of data.
- applicable to ordinal scale of measurement and numeric values of variables.
- also used to denote measure of position or rank.

The Median is appropriate to use :


a.) When the distribution is grossly asymmetrical or skewed or when series
contains a few extremely high or a few low scores compared with the rest of the scores,
the median is the most representative average. This is because the values of the different
scores have nothing to do with the computation of the median.

Symbol Used for the Median:


Med = median ; no standard symbol used
Determination of the Median (Med)
Case 1: Using Ungrouped /Raw Data
Steps: a.) Arrange the data in an array or in order as to highest to lowest or lowest to highest.
b.) The middlemost data is the median

Case 2: Using Grouped Data (Frequency Distribution) = to be covered after the Prelims
Steps involved:
a.) Obtain the “less than” cumulative frequency for each classes.
b.) Determine the median class which is the class whose “less than” cumulative frequency
contains n/2 or ½ (Σf).
c.) Determine the lower boundary of the median class, LB, the frequency of median class, f”,
and the class size, C.
d.) Compute the median as follows:

n/2 - F
Med = LB + f’ C

Where: LB = lower boundary of median class


n = Σf
f’ = actual frequency of median class
F = “less than” cumulative frequency up to but not exceeding n/2
= “less than cumulative frequency just before the median class

NOTE: The median is also considered as one of the measures of position or rank. Median
assumes a dual use as a measure of central tendency and as a measure of position
that divides the whole data file into the upper and lower 50% of the data.
13
3.) Mode or Inspectional Average
- the most common value in the data set or the value/category in the data set with
the highest frequency.
- also considered as the “most typical or popular value” in the data set.
- applicable to variables in all levels or scales or measurement (nominal, ordinal, interval and
ratio)
- mode of a data file is not always unique, a data file may have one or several modes or none at
all.

Symbol Used for the Mode:

Mo = mode ; no standard symbol used

Determination of the Mode:

Case 1: For Ungrouped Data(Raw Data)


- the mode is the value or category that occurs most often in the data set or the data with highest
frequency.

Measures Symbol Definition Features Levels of Measurement


of Central Covered
Tendency
Mean µ = for Sum of the 1.) The Mean or average is Interval and ratio
population values, divided probably the most commonly
mean by total number used method of describing (only for numerical
of values. central tendency. values)
x = for sample 2.) It is computed using all the
mean data values.
3.) It is sensitive to the data
distribution and applicable only
(Standard
for normal distributions not to
Symbols)
skewed distribution.
4.) Applicable only to numerical
data.
5.) the mean is used for
computing other statistics, such
as the variance.
6.) The mean of a data set is
unique.
7.) The mean cannot be
computed for open-ended
frequency distribution.
8.) The mean is affected by
extremely low or high values,
called “outliers”, and may not
be the appropriate average to
use in this situation.
Median Med Middlepoint in 1.) The Median is the score ordinal, interval and ratio
data set that has found at the exact middle of the
(not a been ordered set of values.
standard
symbol) 2.) The median is used when
one must find the center or
middle value of the data set.
14
3.) The median is used for
open-ended distribution.

4.) The median can be used for


skewed distribution.
Mode Mo Most frequent 1.) The mode is the most Applicable to all types of
value frequently occurring value in the variables
(not a set of scores.
standard
symbol) 2.) The mode is the easiest
average to compute.

3.) The mode is not always


unique. A data set may have
one or several or no mode.

Illustration: Determination of Median and Mode


Example:
1.) Educational Attainment

Elem
HS
HS
HS
Col

2.) No. of family members

2
3
4
5
5
8

3.) No. of family members


2
3
3
4
5
5

Types of Distribution of Statistical Data


15

Relative Position of the three measures of central tendency in different distributions:


1.) For normal or bell-shaped distribution
Mean = median = mode
2.) For positively skewed or right-skewed distribution (tail-end on the right)
Mode < median < mean
3.) For negatively skewed or left-skewed distribution (tail-end on the left)
Mean < median < mode

Recommended Measures of Central Tendency to be used for a given scale of measurement:


1.) For nominal variables – measure of central tendency is the mode.
2.) For ordinal variables – measure of central tendency is the median and mode.
3.) For interval/ ratio variable – measure of central tendency is mean, median and mode.

You might also like