You are on page 1of 40

HOW TO ORGANIZE DATA

1. Obtain the number of class intervals


STEPS IN to be used.
CONSTRUCTING A 2. Obtain the size of the intervals.
FREQUENCY 3. Compute for the excess space.
DISTRIBUTION
TABLE 4. Construct the table.
K = 1 + 3.322*(log10n)
Number of
where
class
intervals K = the number of class intervals
rounded upwards
n = the number of observations
To obtain an initial estimate of the size of
the intervals we can use the formula:

æ Ic ö æ Ic ö
ç LO + ÷ - ç SO - ÷
è 2ø è 2ø
ii =
Interval size Kr
where
ii = the initial estimate of the interval size
LO = the largest observed value
Ic = the smallest increment of change in data
SO = smallest observed value
Kr = the K value obtained from Sturges’ rule rounded
upwards
– EXCESS = Space available – Required Space, where
EXCESS – Space available = Kr*I
SPACE – Required Space = [LO + Ic/2] – [SO – Ic/2]
– With the interval size and the number of intervals
Table known, we can construct the frequency distribution
Construction by first dividing the excess between the lowest and
the highest ends of the data.
– Frequency Histogram – a bar graph
representation of a frequency
distribution table. Marked along the
horizontal axis are the class boundaries
(CB). Frequencies are marked along the
GRAPHICAL vertical axis. Each interval is drawn as a
METHODS bar bounded or defined by the class
boundaries and the corresponding
FOR frequencies.
DESCRIBING – Frequency Polygon – uses class
QUANTITATIVE midpoints (CM) to represent the
DATA intervals. Class midpoint is computed as
the average of the lower class limit (LCL)
and the upper class limit (UCL). Class
limits are the visible limits of the
intervals in the frequency distribution
table.
The manager of Juan Auto would like to get a
better picture of the distribution of costs for engine
tune-up parts. A sample of 50 customer invoices has
Example: been taken and the costs of parts, rounded to the
Juan nearest dollar, are listed below.
Auto Repair
91 78 93 57 75 52 99 80 97 62
71 69 72 89 66 75 79 75 72 76
104 74 62 68 97 105 77 65 80 109
85 97 88 68 83 68 71 69 67 74
62 82 98 101 79 105 79 69 62 73
Guidelines for Selecting Number of Classes

– Use between 5 and 20 classes.


– Data sets with a larger number of elements usually
require a larger number of classes.
– Smaller data sets usually require fewer classes.
Frequency – Use the following Sturges’ rule to compute for the
Distribution number of classes or class intervals:
K = 1 + 3.322*log10 n
where K = no. of classes rounded upward
n = no. of observations
– Therefore, in the example: K = 1 + 3.322*log(50) = 6.64
≈ 7 classes
– Guidelines for Selecting Width of Classes

– Use classes of equal width.


– Approximate Class Width :

Frequency
Distribution
where
ii = the initial estimate of the interval size
LO = the largest observed value
Ic = the smallest increment of change in data
SO = smallest observed value
Kr = the K value obtained from Sturges’ rule rounded upwards
– Initial estimate of class size = (109.5 – 51.5)/7 =
8.29857 ≈ 9 = I (actual class size)
– Note: The rounding off of the initial estimate of
class size is based on the type of data that you have
obtained:
Interval Size – If the data are integers, therefore, I = 9
or Class – If the data are in one decimal places, therefore, I =
Width 8.3
– If the data are in two decimal places, therefore, I =
8.29
– If the data are in three decimal places, therefore, I =
8.299 and so on……
– Excess Space = Space available – required space
where
– Space Available = Kr I
Computation – Required Space = [LO + Ic /2] – [SO – Ic /2]
of Excess
Space Therefore, in our example,
– Space available = 7*9 = 63
– Required Space = [109 + ½] – [52 – ½] = 58
– Excess Space = 63 – 58 = 5
– Divide the excess space between the lowest and
highest ends of the data set. That means, you have to
divide the excess space into 2.
– In our example, the excess space is 5.
– It is an odd number, therefore, you cannot exactly
divide it into 2 without having a value in one decimal
Dividing the place, which is 2.5.
Excess Space – Since our data are in integers or whole numbers,
therefore, we should make our excess space into
integers by rounding up or down.
– In case of rounding down, we have to subtract 2 from
the lowest end and add 3 to the highest end.
– In case of rounding up, we have to subtract 3 from the
lowest end and add 2 to the highest end.
– Frequency Distribution in the case of rounding down

Cost Frequency
Example: 50-58 2
Juan 59-67 7
Auto 68-76 17
Repair 77-85 10
86-94 4
95-103 6
104-112 4

Total 50
■ Frequency Distribution in the case of rounding up

Cost Frequency
49-57 2
Example: 58-66 6
Juan 67-75 17
Auto 76-84 10
Repair 85-93 5
94-102 6
103-111 4

Total 50
– Relative Frequency and Percent Frequency Distributions

Example:
Juan
Auto Repair
– One of the simplest graphical summaries of data is
a dot plot.
– A horizontal axis shows the range of data values.
– Then each data value is represented by a dot placed
above the axis.
Dot Plot
– Another common graphical presentation of quantitative
data is a histogram.
– The variable of interest is placed on the horizontal axis.
– A rectangle is drawn above each class interval with its
height corresponding to the interval’s frequency,
relative frequency, or percent frequency.
Histogram
– Unlike a bar graph, a histogram has no natural
separation between rectangles of adjacent classes.
– Cumulative frequency distribution -- shows the number of
items with values less than or equal to the upper limit of
each class.
– Cumulative relative frequency distribution -- shows the
proportion of items with values less than or equal to the
upper limit of each class.
– Cumulative percent frequency distribution -- shows the
percentage of items with values less than or equal to the
Cumulative upper limit of each class.
Distributions
– An ogive is a graph of a cumulative distribution.
– The data values are shown on the horizontal axis.
– Shown on the vertical axis are the:
– cumulative frequencies, or
– cumulative relative frequencies, or
– cumulative percent frequencies
– The frequency (one of the above) of each class is plotted as
a point.
– The plotted points are connected by straight lines.
Ogive
– The techniques of exploratory data analysis consist
Exploratory of simple arithmetic and easy-to-draw pictures that
can be used to summarize data quickly.
Data Analysis – One such technique is the stem-and-leaf display.
– A stem-and-leaf display shows both the rank order and
shape of the distribution of the data.
– It is similar to a histogram on its side, but it has the
advantage of showing the actual data values.
Stem-and-
– The first digits of each data item are arranged to the left of a
Leaf Display vertical line.
– To the right of the vertical line we record the last digit for
each item in rank order.
– Each line in the display is referred to as a stem.
– Each digit on a stem is a leaf.
8 57
9 3678
– Leaf Units

– A single digit is used to define each leaf.


Stem-and- – In the preceding example, the leaf unit was 1.
Leaf Display – Leaf units may be 100, 10, 1, 0.1, and so on.
– Where the leaf unit is not shown, it is assumed to
equal 1.
Example:
Leaf Unit =
0.1
Example:
Leaf Unit =
10
Example:
Juan Auto
Repair
Tabular and
Graphical
Procedures
– Measures of Central Tendency or Location
NUMERICAL Mean, Median and Mode
DESCRIPTIVE – Measures of Dispersion or Variability
MEASURES Range and Standard Deviation
– the sum of the values in the data group divided by
the number of values. The formula is:

X =
åX
n
MEAN
where X = the raw data or
observations
n = the number of
observations or values
X =
å fx
Grouped n
Data Mean where f = frequency of each class interval
x = class midpoints
n = total number of observations
MEDIAN – the middle value in an arrayed data (data which has
been arranged in ascending order).
én ù
~ ê2 -Fú
X = LCB med +ê úI
ê f med ú
where: êë úû
Grouped LCBmed = lower class boundary of the
Median median class
n = number of observations
F = cumulative frequency of the
class before the median class
fmed = frequency of the median class
I = class interval size
–the value which occurs
MODE with the most number
of times in a data set.
é Df1 ù
M = LCBmod +ê úI
ë Df1 + Df 2 û
Grouped
Mode
formula where:
LCBmod = lower class boundary of the modal class
Δf1 = the difference between the frequency of the
modal class and the class immediately before it
Δf2 = the difference between the frequency of the
modal class and the class immediately after it
I = size of the class interval
– is the simplest measure of spread or variability. It is
the difference between the highest score and
lowest score in any given set of data or distribution.
RANGE In the case of the data grouped into intervals, the
range becomes the difference between the higher
boundary of the highest class and the lower
boundary of lowest class.
– the most useful measure of variability. It is special
Standard form of average deviation from the mean which is
Deviation affected by all individual values of the items in any
given distributions.
å (X - µ )
2

s=
Ungrouped N
Population
Standard where X = the individual values of all the
items
Deviation
μ = the population mean
N = the population size
s=
å [ f ( X - µ ) ] 2

Grouped
Population N
Standard
Deviation where X = the class midpoints
f = frequency of each class
interval
μ = the population mean
N = the population size
å (X - X )
2
Ungroupe s=
d Sample n -1
Standard
where X = the individual values of all
deviation
the items
Xbar = the sample mean
n = sample size
å [ f (X - X ) ]
Grouped 2
Sample
s=
Standard
Deviation n -1
where X = the class midpoints
Xbar = the sample mean
n = sample size

You might also like