Professional Documents
Culture Documents
Note Chapter 3
Note Chapter 3
Mean
The mean of a data set is the sum of the data entries divided by the number of entries.
Population Mean,
x where x = data item, N = population size
N
Sample Mean, x
x where x = data item, n = sample size
n
Example:
Find the mean of the following sample data.
(a) -2, 3, 4, 5, 10, 16
Answer: x 6
21
Median
The median of a data set is the value that lies in the middle of the data when the data set
is ordered. If the data set has an odd number of entries, the median is the middle data
entry. If the data set has an even number of entries, the median is the mean of the two
middle data entries.
n 1
th
Median position =
2
Example:
Find the median of the following sets of data.
(a) 7, 16, 8, 25, 3, 20, 10
Answer: Median = 10
Mode
The mode of a data set is the data entry that occurs with the greatest frequency.
Sometimes mode may not exist and even if it exist, it may not be unique. If two entries
occur with the same greatest frequency, each entry is a mode and the data set is called
bimodal. When more than two entries occur with the same greatest frequency, each entry
is a mode and the data set is called multimodal.
Example:
Find the mode for each of the following data set.
i) 3, 4, 1, 3, 5, 3, 8, 2, 9, 3, 2
Mode = 3
22
ii) 2, 8, 1, 2, 9, 10, 8, 11
Mode = 2 and 8 (bimodal)
iii) 3, 6, 7, 6, 3, 7
No mode
Quartiles
Quartiles divide a set of data which are arranged in ascending order into four equal parts.
Example:
Find first quartile, second quartile and third quartile for the following data.
(a) 5, 3, 2, 14, 19, 8, 10
Answer: Q1 3, Q2 8, Q3 14
23
3.2 Measures of central tendency for grouped data
Mean =
f .x where f = frequency
n
x = class mark
n = total frequency
n
( CFm 1 ) wm
Median = Lm 2
fm
D1
Mode = Lm ( ) wm
D1 D2
24
Example:
Find the mean, median and mode for the following data. Round your answer to three
decimal places.
25
3.3 Measures of variation of ungrouped data
A measure of variation (measure of dispersion) describes how spread out or how
scattered a set of distribution of numeric data is. It helps us to study the extent to which
items vary from one another and vary from central values.
Example:
Series A Series B Series C
100 100 1
100 105 489
100 102 2
100 103 3
100 90 5
500 500 500
Series A: None of the item differs from the mean. So, there is no variation. ( s 0 )
Series B: One item is represented by the mean, others vary with small variation.
( s 5.87 )
Series C: Not a single item is represented by the mean, and the item vary widely from one
another. ( s 217.46 )
The difference in the spread can be determined by the measure of dispersion. Four
common measures of dispersion are the range, inter-quartile range, variance and standard
deviation. Range is not a good measure of dispersion because it is influenced by the
extreme values and the calculation does not cover all observations. Among all, variance
and standard deviation are the most useful and widely used measure of dispersion. This is
because, although they are influenced by the extreme values, the calculations cover all
the observations.
Generally, the large value of standard deviation for a data set means that the spread of the
observations around the mean is also large.
26
Range
The range of a data set is the difference between the maximum and minimum data entries
in the set.
Range = (Maximum data entry) - (Minimum data entry)
Example:
Chicago Corporation hired 10 graduates. The starting salaries (in $'00) are shown. Find
the range of the starting salaries.
Salaries 41 38 39 45 47 41 44 41 37 42
Inter-quartile range
The inter-quartile range is defined as the difference between the third quartile and the
first quartile.
Inter-quartile range = Q3 Q1
Example:
Find inter-quartile range for the following data.
3, 2, 4, 10, 4, 8, 9, 10, 19
Answer: Inter-quartile range = 6.5
27
Standard deviation and variance (Ungrouped data)
x 2
x N
2
Population variance, 2
N
x 2
x n
2
Sample variance, s2
n 1
Example:
Find the variance and standard deviation for the sample data. Round answer to three
decimal places:
5, 2, 3, 4, 5, 6, 3.
Answer: s 2 2, s 1.414
28
Example:
Calculate the standard deviations for the following sets of sample data. Hence, determine
which one is more dispersed about the mean than the other.
Data A: 2, 7, 10, 9, 2, 5, 16
Data B: 10, 8, 14, 20, 40, 32, 1, 4, 8, 36, 12, 32
Answer:
Data A: s 4.957
Data B: s 13.501
Data B is more dispersed than data A.
29
3.4 Measures of variation of grouped data
fx 2
fx 2
N
Population variance, 2
N
fx 2
fx 2
n
Sample variance, s
2
n 1
30
Example:
Calculate the variance and standard deviation for the sample data given in the frequency
table below.
Class interval Frequency, f
1–3 5
4–6 3
7–9 2
10 – 12 1
13 – 15 6
16 – 18 4
(Answer: s 2 34.714, s 5.892 )
31
3.5 Coefficient of variation
Sometimes we would like to compare the variability of two different data sets that have
different units of measurement. Standard deviation is not suitable since it is a measure of
absolute variability and not of relative variability. The most appropriate measure is the
coefficient of variation (CV) which expresses standard deviation as a percentage of the
mean and calculated as follows
standard deviation
Coefficient of variation, CV 100%
mean
Note: A larger coefficient of variation means that the data is more dispersed and less
consistent.
Example:
The mean for the monthly salaries of a group of employees who work for a company is
given as RM2000.00 and the standard deviation is RM180.00. The ages of the same
employees have a mean of 35 years and standard deviation of 8 years. Is the variation in
the salaries is the same as the ages for these employees?
Answer:
CV for salaries = 9%
CV for ages = 22.9%
The variation of salaries is smaller
than the variation of ages.
32
3.6 Shape of data distribution
Positions of the mean, median and mode on the histogram or frequency curve can
determine the general shape of the data distribution. Three important shapes arise from
this: symmetry, positively skewed (right-skewed) distribution and negatively skewed
(left-skewed) distribution.
Note that for negatively skewed distribution, most of the data values are on the right of
the mean and accumulate at the upper end of the distribution. As a result, the ‘tail’ is
longer to the left.
For positively skewed distribution, most of the data values are on the left of the mean and
accumulate at the lower end of the distribution. As a result, the ‘tail is longer to the right.
The skewness of a data set can also be determined by Pearson’s coefficient of skewness
and is given by
3(mean - median)
Sk
standard deviation
or
mean - mode
Sk
standard deviation
If S k 0 , then the data distribution is skewed to the left. If S k 0 then the data
33
Tutorial
9 18 22 24 25
1 33 37 43 45
5 12 21 23 25
32 34 38 44 48
34