You are on page 1of 14

CHAPTER 3

NUMERICAL DESCRIPTIVE MEASURE

3.1 Measures of central tendency for ungrouped data


The measure of central tendency, which is also called measure of central location, is a
value which is representative of a set of data. It is used to determine the central value of a
set of data. Three important type of measure of location are the mean, median and mode.
Among three, mean is commonly used. However, for a set of data with an extreme value
(very large or very small), median is more preferred to be used.

Mean
The mean of a data set is the sum of the data entries divided by the number of entries.

Population Mean,  
x where x = data item, N = population size
N

Sample Mean, x 
x where x = data item, n = sample size
n

Example:
Find the mean of the following sample data.
(a) -2, 3, 4, 5, 10, 16
Answer: x  6

(b) 1.2, 2.3, 2.5, 1.8, 3.2, 5.0, 2.2, 3.8


Answer: x  2.75

21
Median
The median of a data set is the value that lies in the middle of the data when the data set
is ordered. If the data set has an odd number of entries, the median is the middle data
entry. If the data set has an even number of entries, the median is the mean of the two
middle data entries.

 n  1
th

Median position =  
 2 

Example:
Find the median of the following sets of data.
(a) 7, 16, 8, 25, 3, 20, 10
Answer: Median = 10

(b) 20, 4, 18, 7, 14, 12, 6, 2


Answer: Median = 9.5

Mode
The mode of a data set is the data entry that occurs with the greatest frequency.
Sometimes mode may not exist and even if it exist, it may not be unique. If two entries
occur with the same greatest frequency, each entry is a mode and the data set is called
bimodal. When more than two entries occur with the same greatest frequency, each entry
is a mode and the data set is called multimodal.

Example:
Find the mode for each of the following data set.
i) 3, 4, 1, 3, 5, 3, 8, 2, 9, 3, 2
Mode = 3

22
ii) 2, 8, 1, 2, 9, 10, 8, 11
Mode = 2 and 8 (bimodal)

iii) 3, 6, 7, 6, 3, 7
No mode

Quartiles
Quartiles divide a set of data which are arranged in ascending order into four equal parts.

Example:
Find first quartile, second quartile and third quartile for the following data.
(a) 5, 3, 2, 14, 19, 8, 10
Answer: Q1  3, Q2  8, Q3  14

(b) 10, 4, 15, 17, 1, 5, 7, 18


Answer: Q1  4.5, Q2  8.5, Q3  16

(c) 3, 19, 8, 4, 21, 15, 9, 20, 10


Answer: Q1  6, Q2  10, Q3  19.5

23
3.2 Measures of central tendency for grouped data

Mean =
 f .x where f = frequency
n
x = class mark
n = total frequency

n
(  CFm 1 ) wm
Median = Lm  2
fm

where Lm = lower class boundary of median class


n = total frequency
CFm 1  cumulative frequency before median class

wm  median class width

f m  frequency of median class

D1
Mode = Lm  ( ) wm
D1  D2

where Lm = lower class boundary of mode class

wm  mode class width

D1  excess mode class frequency over next lower class


D2  excess mode class frequency over next higher class

24
Example:
Find the mean, median and mode for the following data. Round your answer to three
decimal places.

Class interval Frequency, f


1–3 5
4–6 3
7–9 2
10 – 12 1
13 – 15 6
16 – 18 4

(Answer: x  9.714, median  11, mode  14.643 )

25
3.3 Measures of variation of ungrouped data
A measure of variation (measure of dispersion) describes how spread out or how
scattered a set of distribution of numeric data is. It helps us to study the extent to which
items vary from one another and vary from central values.

Example:
Series A Series B Series C
100 100 1
100 105 489
100 102 2
100 103 3
100 90 5
500 500 500

Series A: None of the item differs from the mean. So, there is no variation. ( s  0 )
Series B: One item is represented by the mean, others vary with small variation.
( s  5.87 )
Series C: Not a single item is represented by the mean, and the item vary widely from one
another. ( s  217.46 )

The difference in the spread can be determined by the measure of dispersion. Four
common measures of dispersion are the range, inter-quartile range, variance and standard
deviation. Range is not a good measure of dispersion because it is influenced by the
extreme values and the calculation does not cover all observations. Among all, variance
and standard deviation are the most useful and widely used measure of dispersion. This is
because, although they are influenced by the extreme values, the calculations cover all
the observations.

Generally, the large value of standard deviation for a data set means that the spread of the
observations around the mean is also large.

26
Range
The range of a data set is the difference between the maximum and minimum data entries
in the set.
Range = (Maximum data entry) - (Minimum data entry)

Example:
Chicago Corporation hired 10 graduates. The starting salaries (in $'00) are shown. Find
the range of the starting salaries.
Salaries 41 38 39 45 47 41 44 41 37 42

Range = $4700 – $3700 = $1000

Inter-quartile range
The inter-quartile range is defined as the difference between the third quartile and the
first quartile.
Inter-quartile range = Q3  Q1

Example:
Find inter-quartile range for the following data.
3, 2, 4, 10, 4, 8, 9, 10, 19
Answer: Inter-quartile range = 6.5

27
Standard deviation and variance (Ungrouped data)

For calculation purpose, the formula can be simplified as

 x  2

x  N
2

Population variance, 2 
N

Population standard deviation,   2

 x  2

x  n
2

Sample variance, s2 
n 1

Sample standard deviation, s  s2

Example:
Find the variance and standard deviation for the sample data. Round answer to three
decimal places:
5, 2, 3, 4, 5, 6, 3.
Answer: s 2  2, s  1.414

28
Example:
Calculate the standard deviations for the following sets of sample data. Hence, determine
which one is more dispersed about the mean than the other.

Data A: 2, 7, 10, 9, 2, 5, 16
Data B: 10, 8, 14, 20, 40, 32, 1, 4, 8, 36, 12, 32
Answer:
Data A: s  4.957
Data B: s  13.501
Data B is more dispersed than data A.

29
3.4 Measures of variation of grouped data

Standard deviation and variance (Grouped data)

For calculation purpose, the formula can be simplified as

 fx  2

 fx 2

N
Population variance, 2 
N

Population standard deviation,   2

 fx  2

 fx 2

n
Sample variance, s 
2

n 1

Sample standard deviation, s  s2

30
Example:
Calculate the variance and standard deviation for the sample data given in the frequency
table below.
Class interval Frequency, f
1–3 5
4–6 3
7–9 2
10 – 12 1
13 – 15 6
16 – 18 4
(Answer: s 2  34.714, s  5.892 )

31
3.5 Coefficient of variation
Sometimes we would like to compare the variability of two different data sets that have
different units of measurement. Standard deviation is not suitable since it is a measure of
absolute variability and not of relative variability. The most appropriate measure is the
coefficient of variation (CV) which expresses standard deviation as a percentage of the
mean and calculated as follows

standard deviation
Coefficient of variation, CV   100%
mean

Note: A larger coefficient of variation means that the data is more dispersed and less
consistent.

Example:
The mean for the monthly salaries of a group of employees who work for a company is
given as RM2000.00 and the standard deviation is RM180.00. The ages of the same
employees have a mean of 35 years and standard deviation of 8 years. Is the variation in
the salaries is the same as the ages for these employees?
Answer:
CV for salaries = 9%
CV for ages = 22.9%
The variation of salaries is smaller
than the variation of ages.

32
3.6 Shape of data distribution
Positions of the mean, median and mode on the histogram or frequency curve can
determine the general shape of the data distribution. Three important shapes arise from
this: symmetry, positively skewed (right-skewed) distribution and negatively skewed
(left-skewed) distribution.

mean = median = mode

mode > median > mean

mean > median > mode

Note that for negatively skewed distribution, most of the data values are on the right of
the mean and accumulate at the upper end of the distribution. As a result, the ‘tail’ is
longer to the left.
For positively skewed distribution, most of the data values are on the left of the mean and
accumulate at the lower end of the distribution. As a result, the ‘tail is longer to the right.

The skewness of a data set can also be determined by Pearson’s coefficient of skewness
and is given by
3(mean - median)
Sk 
standard deviation
or
mean - mode
Sk 
standard deviation
If S k  0 , then the data distribution is skewed to the left. If S k  0 then the data

distribution is skewed to the right. If S k  0 then the data distribution is symmetrical.

33
Tutorial

9 18 22 24 25
1 33 37 43 45
5 12 21 23 25
32 34 38 44 48

From data above, find:


1. mean. [26.95]
2. median. [25]
3. mode. [25]
4. quartiles. [ Q1  19.5, Q2  25, Q3  37.5 ]
5. inter-quartile range. [18]
6. variance. [185.5237]
7. standard deviation. [13.6207]
8. range. [47]
9. frequency distribution.

Base on your answer question 9, find:


10. mean. [28]
11. median. [28.8333]
12. mode. [28.5]
13. variance. [177.6316]
14. standard deviation. [13.3278]
15. shape of data distribution. Interpret your answer. [ S k  0. Skewed to the left]

16. histogram. Hence, estimate mode. [between 27.8 and 29.2]


17. ogive. Hence, estimate median. [between 28.1 and 29.5]

34

You might also like