Professional Documents
Culture Documents
DATA ORGANIZATION
Frequency Distribution
II. For Grouped Frequency Distribution - for numeric data (interval or ratio scale)
CLASS FREQUENCY
(score in exam)
1st 0 14 3
2nd 15 29 5
3rd 30 44 8
4th 45 59 12
5th 60 74 10
6th 75 89 7
7th 90 104 5
3.) Class size/width/length (C) - number of units of numeric value in a given class.
Solving for Class Size (C):
Using the Grouped Frequency Distribution
If only one class interval is used:
a.) C = UL – LL + 1
If two consecutive class intervals are used:
2
b.) C = LL of next class – LL of previous class immediately before
Alternatively:
LL of next class = LL of previous class + C
c.) C = UL of next class – UL of previous class immediately before
Alternatively:
UL of next class = UL of previous class + C
Soln:
CLASS
LL UL
1st 10
2nd
5.) Range
Range = Highest Value – Lowest Value (for raw data)
Range = Class midpoint of the last class – Class midpoint of the first class (for grouped frequency distribution)
6.) Class boundaries – range of numerical values that separate the classes so that there are no gaps in the
frequency distribution.
- include
a.) Lower class boundary (LB) = class boundary before the LL
b.) Upper class boundary (UB) = class boundary after the UL
(Basic Rule: The class limits should have the same decimal place value as the collect data, but the class
boundaries should have one additional place value and end with 5)
2. Using LB and UB
1. Find the largest/highest (maximum) = H and smallest/lowest(minimum) = Lvalues from the raw data
set/file.
2. Compute the Range = Maximum – Minimum; R = H - L
3. Decide or Select the number of classes desired. This is usually between 5 and 20.
4. Find the class width/size by dividing the range by the number of classes and rounding up. There are two
things to be careful of here. You must round up, not off. Normally 3.2 would round to be 3, but in
rounding up, it becomes 4, but if class midpoints are to be whole number, the class size can be rounded-
off to the next higher odd number, like instead of class size of 4 it is rounded-up to 5. If the range
divided by the number of classes gives an integer value (no remainder), then you can either add one to
the number of classes or add one to the class width.
C = Range/No. of classes = H – L / no. of classes = value rounded-up to next higher odd number
4
Ex: Given: Raw Data: H = 135, L = 17 , no. of classes = 9
Reqd: Class size/ width, C that is an odd number
Soln:
C = Range/No. of classes = H – L / no. of classes = value rounded-up to next higher odd number
5. Set-up classes – using lowest value in the raw data file as the LL of the first or lowest class.
6. Tally the data.
7. Find the frequencies.
8. Find the cumulative frequencies. Depending on what you're trying to accomplish, it may not be
necessary to find the cumulative frequencies.
9. If necessary, find the relative frequencies and/or relative cumulative frequencies.
Steps 3 and 4: Decide no. of classes(5 classes) and solve for class size, C
CLASSES FREQUENCY
(rpm)
LL UL
64 108 6
109 153 1
154 198 4
199 243 2
244 288 2
Kinds of Frequencies:
1.) Frequency, f = based on the result of usual tally of raw data.
2.) “less than” cumulative frequency, <f = frequency up to but not exceeding the upper boundary of a given
class interval.
3.) “greater than” cumulative frequency, >f = sum of the frequencies more than the lower boundary of a
given class.
4.) relative frequency, Rel f = frequency of a given class divided the total frequency.
5.) Percentage relative frequency, % Rel f = relative frequency multiplied by 100 (expressed in % value)
CLASSES
(rpm)
LL UL
64 108 6
109 153 1
154 198 4
199 243 2
244 288 2
III. Alternative Approach: Data Organization through its distribution Using Stem-Leaf Plot
Stem-Leaf Plot - a method of organizing data and is a combination of sorting and graphing.
- is a data plot that uses part of the data value as the stem and the part of the
data value as the leaf to form groups or classes.
- see pages 80 – 83 Bluman pdf
Where:
Stem = refers to highest place value of a number
Leaf or leaves = refers to the remaining place value or trailing values
Data Presentation:
- usually done by representing the data as graphs or charts.
- Converting the organized data (freq. distribution) into graphs
1) Pie Charts – a circle that is divided into sections or wedges according to the percentage of frequencies in
each category of the distribution.
2) Histogram or Bar Graph - the frequency(y-axis) is plotted against the class boundaries (x-axis) for each
class intervals.
3) Frequency Polygon – the frequency(y-axis) is plotted against the class midpoints (x –axis) of each class.
4) The ogive is a graph that represents the cumulative frequencies for the classes in a frequency
distribution.
4.a) “Less than” Cumulative Frequency Ogive - the “less than” cumulative frequency (y-axis) is
plotted against the upper boundaries (x –axis) of each class.
7
4.b) “Greater than”Cumulative Frequency Ogive – the “greater than” cumulative frequency (y-
axis) is plotted against the lower boundaries (x –axis) of each class.
Percentage
Number of students
Transport Angle size for
of preferring
Type Pie Chart
Students transport
type
9/40 x 100 22.5% of 360°
Walking 9
= 22.5% = 81°
10/40x100 25 % of 360° =
Train 10
= 25% 90°
6/40x100 = 15 % of 360° =
Tram 6
15% 54°
12/40x100 30 % of 360° =
Car 12
= 30% 108°
3/40x100 = 7.5 % of 360°
Bicycle 3
7.5% = 27°
NOTE: Measures found by using all the data values in the population are called “parameters” while measures
obtained by using data values of samples are called “statistics”.
The central tendency of a distribution is an estimate of the "center" of a distribution of values. These are also
single values that describe the whole data points. There are three major types of estimates of central tendency:
• Mean
• Median
• Mode
1.) Mean or arithmetic average or computational average - applicable only to quantitative data
̅
𝒙 = for sample mean
a.) When the distribution consists of ratio or interval data which have no extreme values (too high or
too low in comparison with the other values in the data set).
b.) When other statistics or parameters (like standard deviation, coefficient of correlation, etc.) are
subsequently to be computed.
c.) When the distribution is normal or is not greatly skewed, the mean is usually preferred to either
the median or the mode. In such cases, it provides a better estimate of the corresponding
population parameter than either the median or the mode.
∑𝑥
Or: 𝜇= 𝑁
Example: Find the mean of the values: 17, 33, 27, 45, 65
NOTE: Mean of raw or ungrouped data can also be directly computed using a calculator.
ΣfM Σ M(f)
𝑥̅ = n= Σf or: µ = N = Σf
Σ U(f)
X(bar) = A + n C
Case 2: Using Grouped Data (Frequency Distribution) = to be covered after the Prelims
Steps involved:
a.) Obtain the “less than” cumulative frequency for each classes.
b.) Determine the median class which is the class whose “less than” cumulative frequency
contains n/2 or ½ (Σf).
c.) Determine the lower boundary of the median class, LB, the frequency of median class, f”,
and the class size, C.
d.) Compute the median as follows:
n/2 - F
Med = LB + f’ C
NOTE: The median is also considered as one of the measures of position or rank. Median
assumes a dual use as a measure of central tendency and as a measure of position
that divides the whole data file into the upper and lower 50% of the data.
13
3.) Mode or Inspectional Average
- the most common value in the data set or the value/category in the data set with
the highest frequency.
- also considered as the “most typical or popular value” in the data set.
- applicable to variables in all levels or scales or measurement (nominal, ordinal, interval and
ratio)
- mode of a data file is not always unique, a data file may have one or several modes or none at
all.
Elem
HS
HS
HS
Col
2
3
4
5
5
8