Professional Documents
Culture Documents
STATISTICS
FREQUENCY DISTRIBUTION
A frequency table categories (or classes) of
scores, along with counts (or frequencies) of the
number of scores that fall into each category.
The frequency for a particular class is the number of
original scores that fall into that class.
The types of frequency
distribution
1. Categorical
2. Ungrouped
3. Grouped
Categorical or Qualitative Frequency Distributions
AB B A O B
O B O A O
B O B B B
A O AB AB O
A B AB O A
Represent the blood types as classes and the number of
occurrences for each blood types as frequencies. The
table (distribution) below summarizes the data
Class (Blood type) Frequency
Distribution
A 5
B 8
O 8
AB 4
Total 25
Quantitative Frequency Distribution – Ungrouped
An ungrouped frequency distribution simply lists the data
values with the corresponding number of times or frequency
count with which each value occurs.
Day 1 2 3 4 5 6 7 8 9 10 11 12 13
Defects 10 10 6 12 6 9 16 20 11 10 11 11 9
Day 14 15 16 17 18 19 20 21 22 23 24 25
Defects 12 11 7 10 11 14 21 12 6 10 11 6
Solution: The frequency distribution for the number of
defects is shown below:
CLASS TALLY FREQUENCY CLASS BOUNDARY CLASS MARK Relative Less than Greater than
Cumulativ Cumulative
INTERVAL Frequency Frequency
e
(RF) (>CF)
Frequency
(<CF)
limit 15-19
Upper class
20-24 limit
25-29
30-34
35-39
Class boundaries are the numbers used to separate classes, but
without the gaps created by the class limits. They are obtained
increasing the upper class limits and decreasing the lower class
limits by the same amount so that there are no gaps between
consecutive classes. The amount be added or subtracted is one-
half the difference between the upper limit of one class and the
lower limit of the following class.
CLASS FREQUENCY CLASS
INTERVAL BOUNDARY
5-9 4 4.5-9.5
25-29 20 0.20
30-34 15 0.15
35-39 10 0.10
Total 100
Cumulative Frequency accumulated frequency that is <, > to a stated value. We obtain the >
cumulative frequency if the frequencies are summed from bottom up to find the number of
observations greater than a specified lower class boundary. The less than cumulative is
constructed if the frequencies are summed from top down to find the number of observations
less than a particular upper class boundary .
20-24 IIII-IIII-IIII- 26 55 71
IIII-IIII-I
25-29 IIII-IIII-IIII-IIII 20 75 10+15+2
0
45
10+15
30-34 IIII-IIII-IIII 15 90 25
1 (0-9) 99 2
2 (0-9) 0112344555556677777889 23
9
3 (0-9) 00114 5
STEM AND LEAF DIAGRAM (2 lines)
STEM LEAF FREQUENCY
1 (5-9) 99 2
2 (0-4) 0112344 7
2 (5-9 5555566777778899 16
3 (0-4) 00114 5
STEM AND LEAF DIAGRAM (3 lines)
2 (0-1) 011 3
2 (2-3) 23 2
2 (4-5) 4455555 7
2 (6-7) 6677777 7
2 (8-9) 8899 4
3 (0-1) 0011 4
3 (2-3) 0
3 (4-5) 4 1
STEM AND LEAF DIAGRAM
2 (0-1) 011 3
2 (2-3) 23 2
2 (4-5) 4455555 7
2 (6-7) 6677777 7
2 (8-9) 8899 4
3 (0-1) 0011 4
3 (2-3) 0
3 (4-5) 4 1
GRAPHICAL REPRESENTATION OF FREQUENCY DISTRIBUTION
GRAPHICAL REPRESENTATION OF FREQUENCY
DISTRIBUTION
a. HISTOGRAM
The histogram is a set of vertical bars having their bases
or the horizontal axes which center on the class marks.
The width corresponds to the class marks and the height
correspond to the frequencies.
A histogram differs from a bar chart in the bases of each
bar are the class boundaries rather than the class limits.
Histogram
4
Frequency
0
80.00 100.00 120.00 140.00 160.00
IQ
Quantitative Frequency Distribution – Grouped
CLASS TALLY FREQUENCY CLASS BOUNDARY CLASS MARK Relative Less than Greater than
Cumulativ Cumulative
INTERVAL Frequency Frequency
e
(RF) (>CF)
Frequency
(<CF)
20
F 17
R
E 15
Q
U 10
E 8
N
C 4
Y
CLASS BOUNDARY
FREQUENCY HISTOGRAM
b. FREQUENCY POLYGON
The frequency polygon is a modification of the histogram; only, the frequency
polygon is line graph where the class frequencies is plotted against the class
marks. To close the polygon, an extra class mark at each end must be added.
The frequency polygon can also be obtained by connecting midpoints of the
tops of the rectangles in the histogram.
F
R
E
Q
U
E
N
C
Y
CLASS MARK
c. OGIVES
A line graph showing the cumulative frequency of distribution is called an ogive. For the “less than”
ogive, the “less than” cumulative frequencies are plotted against the upper class boundaries. For the
“greater than” ogive, the greater than cumulative frequencies are plotted directly above the lower
class boundaries. These graphs are useful in estimating the number of observations that are less than
or more than a specified value.
CLASS BOUNDARY CLASS MARK Relative Less than Greater than
Cumulativ Cumulative
Frequency Frequency
e 120
(RF) (>CF)
Frequency
(<CF)
100
4.5-9.5 7 0.04 4 100
80
19.5-24.5 22 0.26 55 71 0
4.5 9.5 14.5 19.5 24.5 29.5 34.5 39.5
24.5-29.5 27 0.20 75 45
29.5-34.5 32 0.15 90 25
80
70
60
F 50
R 40 Seri
E 30 ...
Q 20
U 10
E 0
N 1 2 3 4 5 6 7 8 9
C
Y OGIVE
CLASS BOUNDARY
Other Shape: (SKEWED TO THE LEFT)
80
70
60
F
50
R <OG
40 ...
E
30
Q
20
U 10
E 0
N 0.5 5.5 10.5 15.5 20.5 25.5 30.5 35.5
C
Y
CLASS BOUNDARY
Other Shape; PLATYKURTIC
F
R
E
Q
U
E
N
C
Y
CLASS BOUNDARY
other Shape:
LEPTOKURTIC
F
R
E
Q
U
E
N
C
Y
CLASS BOUNDARY
DESCRIPTIVE STATISTICS
Measures of Location
Measures of Variability
Measures of Shape
MEASURES OF CENTRAL TENDENCY (location)
A measure of central tendency gives a single value that act as a
representative average of the values of all the outcomes of your
experiment. Three parameters that measure the center of the
distribution in some sense are of interest. These parameters, called the
population mean, the population median and the population mode.
Central Tendency
refers to the Middle
of the Distribution
A. THE MEAN
For Ungrouped Data:
Let x1 , x2 , x3 ,…. xn be n observations of a random variable X. The sample mean,
denoted by x, is the arithmetic average of these values. That is,
N
x i
i 1
for population mean
N
n
x i
x i 1
for sample mean
n
For Grouped Data
f i xi
or x i 1
k
i 1
fi
x( n 1) / 2 If n is odd
~
x
x( n / 2 ) x( n / 2 ) 1
If n is even
2
For Grouped Data:
When the data are grouped into a frequency distribution, the median is
obtained by finding the cell that has the middle umber and then
interpolating within the cell.
~ n cf n cf
x Lbi 2 i 1
(class size)
~
2 i 1
x Ubi (class size)
fi fi
where:
Lbi = lower class boundary of the interpolated interval
Ubi = lower class boundary of the interpolated interval
<cfi-1 = less than cumulative frequency of the class before interpolated
interval
>cfi-1 = greater than cumulative frequency of the class before
interpolated interval
fi = frequency of the interpolated interval
i = interpolated interval
n = number of data points
C. THE MODE
d1
Mode LB (class size)
d1 d 2
L B lower boundary of the modal class
Modal Class is a category contanig the highest frequency
d 1 difference between th e frequency of the modal class and frequency above it when th e
scores are arranged from lowest to highest
d 2 difference between th e frequency of the modal class and frequency below it when th e
scores are arranged from lowest to highest
EXAMPLES:
1. A high school teacher at a small private school assigns trigonometry practice problems to be
worked via the net. Students must use a password to access the problems and the time of log-
in and log-off are automatically recorded for the teacher. At the end of the week, the teache
examines the amount of time each student spent working the assigned problems. The data is
provided below in minutes.
Data 15 28 25 48 22 43 49 34 22
33 27 25 22 20 39
x 15+ 28 + 25 + 48 + 22 + 43 + 49 + 34 +22 + 33 +27 + 25 + 22 + 20 +3
Mean = ------------------------------------------------------------------------------------------
15
Mean = 30.13
~
Median :
x = 15 20 22 22 22 25 25 27 28 33 34 39 43 48 49
= 27
^
Mode :
x =22
2. The number of television viewing hours per household
and the prime viewing times are two factors that affect
television advertising income, A random sample of 30
households in a particular viewing area produced the
following estimated of viewing hours per household.
3.0 6.0 7.0 15.0 12.0 6.1
Modal Class 1
3
Mode = 69.5 + -------- (10) = 79.5
3-0
Modal Class 2
0
Mode = 79.5 + -------- (10) =79.5
0 - 10
4. Find the sample mean, sample median and sample mode
10-14 8 9.5-14.5 12 12 96
15-19 17 14.5-19.5 17 29 88
20-24 26 19.5-24.5 22 55 71
25-29 20 24.5-29.5 27 75 45
30-34 15 29.5-34.5 32 90 25
Nominal Mode
Ordinal Median
(x i )2
2 i 1 For ungrouped data
N
f i ( xi ) 2
2 i 1
k
For grouped data
f i 1
i
_
The variance of a sample of n measurements is defined to be the sum of the
squared deviations of the measurement about their mean x divided by (n-1).
The sample variance is denoted by s² and is given by the formula
n
(x i x) 2
For ungrouped data
s2 i 1
n 1
k
f i ( xi x ) 2
For grouped data
s2 i 1
k
fi 1
i 1
10– 19 3 14.5 3
20 – 29 2 24.5 5
30 – 39 3 34.5 8
40 – 49 4 44.5 12
50 – 59 5 54.5 17
60 – 69 11 64.5 28
70 – 79 14 74.5 42
80 – 89 14 84.5 56
90 – 99 4 94.5 60
Range = Highest Upper Class Boundary - Smallest Lower Class Boundary
= 99.5 – 9.5
= 90
ƒ (x - µ) ²
² = ----------------- ƒ
3(14.5 – 66)2 +2 (24.5 – 66)2 +3 (34.5 – 66)2 + 4(44.5 – 66)2 +
5(54.5 – 66)2 +11 (64.5 – 66)2 +14 (74.5 – 66)2 +
14(84.5 – 66)2 + 4(94.5 – 66)2
² = ----------------------------------------------------------------------------
60
= 432.75
= 20.80264406 or 20.80
Fx-991E plus
1. MODE, 3:STAT, 1: 1-VAR
2. INPUT DATA
3. AC
4. SHIFT, 1-STAT, 4:VAR
Choices:
1: n # of data points
2: mean ( sample/population mean
3: population standard deviation
4: sample standard deviation
Fx-991EX
1. MAIN MENU: PRESS 6
2. SELECT 1: 1-VARIABLE
3. INPUT DATA
4. PRESS OPTN, 3: (1-VARIABLE CALC)
SKEWNESS
KURTOSIS
Measures of Shape
Skewness
SK = 0
Symmetric (Normal)
SK= S[(Xi - μ)/s]3
N SK > 0
where: Positively Skewed
Xi - individual reading
σ - standard deviation
μ - mean SK< 0
N - population size Negatively Skewed
Skewness relating to central tendency
negative skew: The left tail is longer than the right tail. It
has relatively few low values. The distribution is said to
be left-skewed or "skewed to the left“; Example
(observations): 1,1000,1001,1002,1003
positive skew: The right tail is longer the left tail. It has
relatively few high values. The distribution is said to be
right-skewed or "skewed to the right".Example
(observations): 1,2,3,4,100.
k = 3
MesoKurtic (Normal)
k = S[(Xi - μ)/s] 4
where:
N
k > 3
Xi - individual reading LeptoKurtic
σ - standard deviation
μ - mean
N - population size k < 3
PlatyKurtic
Platykurtic data set has a flatter peak around its mean, which causes thin
tails within the distribution. The flatness results from the data being
less concentrated around its mean, due to large variations within
observations
N
~
x
ODD value of X that
n occurs most
x x( n / 2 ) x( n / 2 ) 1
i
often
x i 1
2 EVEN
n
GROUPED DATA
k
f i xi
n cf
the midpoint of
or x ~ the cell with the
x Lbi 2
i 1
i 1
k
fi (CS) highest
i 1 fi frequency
VARIABILITY
RANGE VARIANCE STANDARD
N
DEVIATION
UNGROUPED DATA R= HV-LV (x i )2
2 i 1 Population variance 2
n N
i x) 2
( x s s2
s 2 i 1
Sample variance
n 1
GROUPED DATA R= Highest Upper
f
k
i ( xi ) 2
Class Boundary – 2
i 1
k Population variance 2
Lowest Lower Class k
fi
i 1
f i ( xi x ) 2 s s2
Boundary s2 i 1
Sample variance
k
fi 1
i 1
SUMMARY:
SHAPE
GROUPED DATA k k
f i ( xi ) 3 f i ( xi x ) 3
SK i 1 SK i 1
k k
f i 3 fi s 3
i 1 i 1
KURTOSIS
UNGROUPED DATA N N
K= 3, mesokurtic
(x i )4 (x i x) 4
K>3, leptokurtic
K i 1
K i 1
N 4 ns 4 K<3, mesokurtic
GROUPED DATA
k k
f i ( xi ) 4 f i ( xi x ) 4
K i 1
K i 1
k k
f i 4 fi s 4
i 1 i 1
PRACTICAL SIGNIFICANCE OF THE
STANDARD DEVIATION
2.35% 2.35%
0.15% 0.15%
Examples:
1. Let X be the number of screws delivered to a box by an automatic filling
device. Assume µ = 1,000, σ2 = 25. There are problems with too many
(giving away free product) or too few (potential irritated customers)
screws in a box.
a) How many σ-units to the right of µ is 1009?
b) What X value 2.6 σ-units to the left of µ ?
c) Use Chebyshev’s inequality to find a bound on P[994 < X < 1006].
c) Area = 1 – 1/k2
Area = 1 – 1/(-1.2)2
To solve k:
=0.3056
limit - 994 1000
k 1.2
5 P(994 < X < 1006) ≥ 0.3056
2.The mean life of a certain brand of auto batteries is 44
months with a standard deviation of three months. Assume
that the lives of all auto batteries of this brand have a bell-
shaped distribution. Using the empirical rule, find the
percentage of auto batteries of this brand that have a life
of
a. 41 to 47 months b. 41 to 50 months c. 35 to 53
months
0.34 0.34
0.135
0.135
0.0015 0.0235 0.0235 0.0015
35 38 41 44 47 50 53