Professional Documents
Culture Documents
Numerical Descriptive
Measures
Learning Outcome
Explore, describe and compare data sets
using the basic tools after collecting the
data.
3.5 Skewness.
3.1 Measures of central tendency for
ungrouped data
Represent a data set by some numerical
measures (typical values).
A single value that summarizes a set of data.
It locates the centre of the values.
Give the centre of a histogram or a frequency
distribution curve.
Measures of central tendency (cont'd)
Mode
Mean
Median
Median is the value of the middle term in a
data set that has been ranked in increasing
or decreasing order.
Median is the value of the [(n+1)/2]th term in
a ranked data set;
n = total number of elements in the set.
Note:
1. If n is odd, then median is the value of the middle term in the ranked data.
2. If n is even, then median is the average value of the two middle terms.
Median (cont'd)
Example 3.1
i) Find the median of set A = { 10, 5, 19, 8, 3 }
and set B = { 2, 7, 3, 6, 4, 5 }.
Solution:
Median (cont'd)
ii) Find the median of set C = {3, 5, 8, 10, 90}
Compare the median of set C with the
median of set A, and state your conclusion.
Solution:
Median (cont'd)
For ungrouped data in the form of frequency
distribution of single-valued classes
Median can be found either from ungrouped
frequency distribution or from the cumulative
frequency distribution.
Median (cont'd)
Example 3.2
Find the median of the following frequency
distribution.
No. of 0 1 2 3 4 5
children
Frequency 3 5 12 9 4 2
Median (cont'd)
Solution:
No. of children < 0 < 1 < 2 < 3 < 4 < 5 < 6
Cumulative 0 3 8 20 29 33 35
frequency
n
n
n
x
i 1
i
Mean (cont'd)
Example 3.5
Find the arithmetic mean for the data set
{ 158, 189, 265, 127, 191 }
Solution:
Mean (cont'd)
Find the arithmetic mean for the data set
{ 158, 189, 265, 127, 1091 }. Compare your
answer with the answer of example 3.5.
Solution:
Mean (cont'd)
Note:
Not necessary takes one of the values in the
original data.
Influenced by extreme value
extreme value.
Mean (cont'd)
For ungrouped data in the form of frequency
distribution of single-valued classes.
f1 x1 f 2 x2 ... f n xn 1 n f i xi
X fi xi
n n i 1 f i
Mean (cont'd)
Example 3.6
Find the mean of the following frequency
distribution.
xi 2 5 6 8
fi 1 3 4 2
Mean (cont'd)
Solution:
xi 2 5 6 8
fi 1 3 4 2
fi xi
Mean (cont'd)
Effect of outliers or extreme values
Weight (kg) 60 – 62 63 – 65 66 – 68 69 – 71 72 – 74
Frequency 3 4 5 6 2
Solution: Method 1
Estimate the mode by graphically
(histogram) Histogram
6
Modal class
5
= 69-71
Frequency
4
Estimated mode
3
=69
2
0
61 64 67 70 73
Weight
Mode
For grouped data,
f m fb
Mode Lm C
f m fb f m f a
f m fb
Lm C
2 f m fb f a
20 1 60 – 62 3 3
Tx 10.5
2
63 – 65 4 7
LB 65.5, FB 7,
f m 5, C 3 66 – 68 5 12
69 – 71 6 18
1
2
20 7
x 65.5
% 3 67.3 72 - 74 2 20
5
Using cumulative frequency curve
Cumulative curve
25
20
15
10
0
59.5 62.5 65.5 68 68.5 71.5 74.5
Mean (cont'd)
For grouped data in the form of frequency
distribution
Suppose data are grouped into k class
intervals, and
fi = the frequency of class i , N= fi =population size
mi = the midpoint of class i ,n = fi = sample size
mean for population data:
f i mi
N
mean for sample data:
f i mi
X
n
Mean (cont'd)
Example 3.9
Find the mean of the following frequency
distribution.
fi mi 832, n 50
fi mi
X 16.64
n
Using Excel to find Mean
Relationships among the mean,
median and mode
Symmetric histogram and frequency curve with
one peak.
The values of the mean, median and mode are identical
and lie at the centre of the distribution.
Relationships among the mean,
median and mode (cont'd)
Histogram and frequency curve skewed to the
right or positively skewed.
The value of the mean is the largest, mode is the smallest
and median lies between these two.
Relationships among the mean,
median and mode (cont'd)
Histogram and frequency curve skewed to
the left or negatively skewed.
The value of the mean is the smallest, mode is
the largest and median lies between these two.
Relationships among the mean,
median and mode (cont'd)
Nominal
Mode
(Categorical)
Ordinal Median
Population Mean =
N
x
i 1
i
N
1
Population Variance = 2
N i 1
( xi ) 2
1 N 2
( xi ) 2
N i 1
Measures of dispersion for ungrouped data
1 n
1 n
2
xi xi
2
n 1 i 1 n i 1
Measures of dispersion for ungrouped data
Standard Deviation
The standard deviation is the positive square root
of the variance
Sample standard deviation = s s
2
Note:
A small standard deviation means that the data are
distributed closely to their mean.
A large standard deviation means that the data are widely
scattered about their mean.
It is influenced by extreme values
Measures of dispersion for ungrouped data
Example 3.11:
Data shows the salary per day for all 6 employees
of a small company.
29.50, 16.50, 35.40, 21.30, 49.70, 24.60
Calculate the variance and standard deviation for
these data.
Measures of dispersion for ungrouped data
Solution:
Mean, =
xi xi ( xi ) 2
xi 2
29.50
16.50
35.40
21.30
49.70
24.60
Measures of dispersion for ungrouped data
Method 1
Population Variance =
Method 2
Population Variance =
Sample variance =
n5 xi 312 xi2 20100
x X
MAD
n
Example 3.13
Calculate the mean absolute deviation of the data
[29.50, 16.50, 35.40, 21.30, 49.70, 24.60] in Ex
3.11.
Solution
x 0.00 13.00 5.90 8.20 20.20 4.90
52.2
1
MAD 52.2 8.7
6
Measures of dispersion for ungrouped data
Solution:
Mean, =
x i
177
29.5
n 6
xi xi ( xi ) 2
xi 2
29.50 0.00 0.00 870.25
16.50 -13 169 272.25
35.40 5.90 34.81 1253.16
21.30 - 8.20 67.24 453.69
49.70 20.20 408.04 2470.09
24.60 - 4.90 24.01 605.16
0 703.1 5924.6
Measures of dispersion for grouped data
Variance, N
1
Population Variance =
2
N
i 1
f i (mi ) 2
2
f i m f i mi
2
i
N N
n
1
Sample Variance = s2 i i
n 1 i 1
f ( m X ) 2
1 n
1 n
2
fi mi f i mi
2
n 1 i 1 n i 1
Measures of dispersion for grouped data
Example:
Find the variance from the following frequency
distribution if it represent
a) population b) sample
Height (m) 20 – 22 23 – 25 26 – 28 29 – 31 32 – 34
Frequency 3 6 12 9 2
Measures of dispersion for grouped data
Solution:
1 n
1 n
2
s2 i i f i mi
2
f m
n 1 i 1 n i 1
1 1 2
23805 807 10.15222
31 32
Use of Standard Deviation
Find the proportion or percentage of the total
observations that fall within a given interval
about the mean by using the following
theorem.
Chebyshev’s Theorem
For any number k greater than 1, at least
(1-1/k2)100% of the data values lie within k
standard deviations of the mean.
Use of Standard Deviation
Example 3.15:
For a statistics class, the mean for the midterm
scores is 75 and the standard deviation is 8.
Find the percentage of students who scored
between 59 and 91.
Solution:
59 75 91 75
x 75, s 8, k L 2, k H 2
8 8
1
k 2, where 1 2 0.75
2
According to Chebyshev's Theorem, at least 75% of
the students scored between 59 and 91.
Use of Standard Deviation
Empirical Rule
For a bell-shaped distribution, approximately
68% of the observations lie within one standard
deviation of the mean.
95% of the observations lie within two standard
deviations of the mean.
99.7% of the observations lie within three standard
deviations of the mean.
Use of Standard Deviation
Example 3.16:
Recalculate the percentage in Previous Example if
the midterm scores is following a bell-shaped
distribution.
Solution:
With k=2, by using empirical rule, approximately 95
% of the students scored between 59 & 91.
Coefficient of variation (V )
Coefficient of variation (V ) expresses the
standard deviation as a percentage of what is
being measured, at least on the average.
s
V 100% or V 100%
x
If V is smaller for one set of data than another,
there is relatively less variability in the first set of
data, and we say that it is “more consistent”.
Coefficient of variation (V )
Example 3.17:
During the past few months, one runner runs
averaged 12 miles per week with a standard
deviation of 2 miles, while another runner runs
averaged 25 miles per week with a standard
deviation of 3 miles.
Which of the two runners is relatively more
consistent in his weekly running habits?
Coefficient of variation (V )
Solution:
Q 1 Q 3
m m
Q 2
Sound difficult?
3.75>3.5=4
2.25<2.5=2
To Find The Quartiles of Ungrouped
Data (Cont’d)
Q 1 Q 3
m m
Q 2
Sound difficult?
Weight (kg) 60 – 62 63 – 65 66 – 68 69 – 71 72 – 74
Frequency 3 4 5 6 2
Cumulative 3 7 12 18 20
frequency
Example 3.19 (Cont’d)
n 20
5,
4 4
3 20
Q1 62.5 3 64
4 4
3n 3 20
15,
4 4
3
Q3 68.5 15 12 70
6
3.5 Skewness
There are three shapes commonly observed:
symmetric, positively skewed and negatively
skewed.
In a symmetric set of observations, the mean and
median are equal and the data values are evenly
spread around these values.
A set of values is skewed to the right or positively
skewed if there is a single peak and the values
extend much further to the right of the peak than to
the left of the peak. The mean is larger than the
median.
In a negatively skewed distribution, there is a single
peak but the observations extend further to the left.
The mean is smaller than the median.
Skewness
To calculate skewness of a distribution or a set
of data, we can use Pearson’s coefficient of
skewness
3( Mean Median)
sk
Note: s
The value range from -3 to 3.
If sk < 0, then the distribution is negatively skewed.
If sk > 0, then the distribution is positively skewed.
A value of 0 occurs when the distribution is
symmetrical and there is no skewness present.
Note:
1.The more negative it is, the distribution is
more negatively skewed.
2.The more positive it is, the distribution is
more positively skewed.
Skewness
Example 3.21
Following are the earning per share for a sample of
15 software companies for the year 2000. The
earnings per share are arranged from smallest to
largest.
0.09 0.13 0.41 0.51 1.12 1.20 1.49 3.18 3.50 6.36
7.83 8.92 10.13 12.99 16.40
Compute the mean, median and standard deviation.
Find the coefficient of skewness using Pearson’s
estimate. What is your conclusion regarding the
shape of the distribution?
Example 3.21 (Solution)
Kurtosis
Measures the relative concentration of values in the
center of the distribution, as compared with the tails.
20 40 60 80
2 4 6 8 10 2 4 6 8 10
2 4 6 8 10
2 4 6 8 10
Left Skewed
Left Skewed and an outlier
Review
3.1 Measures of central tendency for ungroup data.
3.5 Skewness.
The End
Chapter 3