You are on page 1of 22

BAMS1743 QUANTITATIVE METHODS

CHAPTER 2: DATA DESCRIPTION (B)

MEASURES OF DISPERSION
 Measures of dispersion help us to understand the spread or variability of
a set of data. It gives additional information to judge the reliability of the
measure of central tendency and helps in comparing dispersion that is
present in various samples.

 Two data sets can have the same mean, the same median, or the same
mode and yet they are very different in other respects.

 Example: consider the heights (cm) of five employees from each of the
sales and production departments as shown:

Sales department: 183 185 193 193 198


Production department: 170 183 193 193 213

The two groups have the same mean height, 190.4 cm, the same median
heights, 193 cm, and the same modal height, 193 cm. Nonetheless, it is
clear that the two data sets differ. To describe this difference
quantitatively, we use a measure of dispersion.

 There are several commonly used measures of dispersion. They are


range, quartile deviation, variance, standard deviation and coefficient of
variation.

 The more spread out or dispersed the data, the larger is the range, the
quartile deviation, the variance and the standard deviation.

Range
 Range is the difference between the largest and the smallest
observations in a data set.
 For raw data: Range = largest value – smallest value

Example: Find the range for the following data.


10 15 17 20 25 29 30 35 38 40 45

1
Solution: Range =

 For grouped data (discrete):


𝑹𝒂𝒏𝒈𝒆 = 𝑼𝒑𝒑𝒆𝒓 𝒄𝒍𝒂𝒔𝒔 𝒍𝒊𝒎𝒊𝒕 𝒐𝒇 𝒕𝒉𝒆 𝒍𝒂𝒔𝒕 𝒄𝒍𝒂𝒔𝒔 −
𝒍𝒐𝒘𝒆𝒓 𝒄𝒍𝒂𝒔𝒔 𝒍𝒊𝒎𝒊𝒕 𝒐𝒇 𝒕𝒉𝒆 𝒇𝒊𝒓𝒔𝒕 𝒄𝒍𝒂𝒔𝒔

 For grouped data (continuous):


𝑹𝒂𝒏𝒈𝒆 = 𝑼𝒑𝒑𝒆𝒓 𝒄𝒍𝒂𝒔𝒔 𝒃𝒐𝒖𝒏𝒅𝒂𝒓𝒚 𝒐𝒇 𝒕𝒉𝒆 𝒍𝒂𝒔𝒕 𝒄𝒍𝒂𝒔𝒔 −
𝒍𝒐𝒘𝒆𝒓 𝒄𝒍𝒂𝒔𝒔 𝒃𝒐𝒖𝒏𝒅𝒂𝒓𝒚 𝒐𝒇 𝒕𝒉𝒆 𝒇𝒊𝒓𝒔𝒕 𝒄𝒍𝒂𝒔𝒔

Example (Discrete)
The following table shows the daily outputs of 80 workers in a factory.
Determine the range.
Daily output (units) Number of workers
10 – 19 6
20 – 29 10
30 – 39 30
40 – 49 20
50 – 59 14
60 – 69 4
Solution: Range =

Example (Continuous)
Find the range of the following frequency distribution regarding the time
spent by the .
Time Spent Number of students
0 –< 6 2
6 –< 12 4
12 –< 18 10
18 –< 24 12
24 –< 30 8
Solution: Range =

2
 Advantage:
It is easy to understand and simple to calculate.
 Disadvantage:
Since only the largest and the smallest values are considered, it can be
very much influenced by them especially if they are unrepresentative
extreme values.

Quartile deviation (Semi-inter quartile range)

 The disadvantage of the range can be overcome by ignoring the extreme


values. This is done by ignoring the top and the bottom quarters and
considering only the range between the quartiles (called the inter-
quartile range).
Inter-quartile range = Q3  Q1

 If the inter-quartile range is divided by two, the figure obtained is called


semi inter-quartile range or quartile deviation which gives the average
amount by which Q1 and Q3 differ from the median.
Quartile deviation = semi-inter quartile range
Q3  Q1
=
2
 Definition of quartiles:
(a) The first quartile, also called the lower quartile, is denoted by Q1. It is
defined as the value of an item one-quarter of the way through a
distribution
(b) The third quartile, also called the upper quartile, is denoted by Q3. It
is defined as the value of an item three-quarter of the way through a
distribution.

Quartiles divide the data into 4 equal parts. Thus, with the quartiles
known, we can say that a quarter of the observations lies below the first
quartile. A quarter lies above the third quartile while half of the
observations lies between the two quartiles.

3
 Computation of the quartiles:
(a) Raw data
1. Arrange the data into an array in ascending order of magnitude.
2. Locate the quartile items as:
Q1 = value of n  1 th item
4
3( n  1)
Q3 = value of th item
4
where n = number of items in a data set.

Example

The following array shows the daily income (in RM) of 10 factory workers:
20, 25, 26, 30, 32, 36, 38, 38, 40, and 50.

Calculate (a) the quartiles;


(b) the inter-quartile range;
(c) the quartile deviation.

Solution:
(a) 𝑛 = 10,
𝑛+1 10 + 1
𝑄1 = 𝑣𝑎𝑙𝑢𝑒 𝑜𝑓 𝑡ℎ 𝑖𝑡𝑒𝑚 = 𝑣𝑎𝑙𝑢𝑒 𝑜𝑓 𝑡ℎ 𝑖𝑡𝑒𝑚
4 4
= 𝑣𝑎𝑙𝑢𝑒 𝑜𝑓 2.75𝑡ℎ 𝑖𝑡𝑒𝑚
= 𝑣𝑎𝑙𝑢𝑒 𝑜𝑓 2𝑛𝑑 𝑖𝑡𝑒𝑚 + 0.75 × (3𝑟𝑑 𝑖𝑡𝑒𝑚 − 2𝑛𝑑 𝑖𝑡𝑒𝑚)
= 25 + 0.75 × (26 − 25) = 25.75 (𝑅𝑀)

3(𝑛 + 1)
𝑄3 = 𝑣𝑎𝑙𝑢𝑒 𝑜𝑓 𝑡ℎ 𝑖𝑡𝑒𝑚 = 𝑣𝑎𝑙𝑢𝑒 𝑜𝑓 𝑡ℎ 𝑖𝑡𝑒𝑚
4
= 𝑣𝑎𝑙𝑢𝑒 𝑜𝑓 𝑡ℎ 𝑖𝑡𝑒𝑚
= 𝑣𝑎𝑙𝑢𝑒 𝑜𝑓 𝑡ℎ 𝑖𝑡𝑒𝑚 + × ( 𝑡ℎ 𝑖𝑡𝑒𝑚 − 𝑡ℎ 𝑖𝑡𝑒𝑚)
=

(b)
𝐼𝑛𝑡𝑒𝑟-𝑞𝑢𝑎𝑟𝑡𝑖𝑙𝑒 𝑟𝑎𝑛𝑔𝑒 = 𝑄3 − 𝑄1 =

(c)
𝑄𝑢𝑎𝑟𝑡𝑖𝑙𝑒 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 =

4
 Advantages:
It can be computed even though the end values of the distribution are
not known, as with the open-ended classes.
It is not influenced by the extreme values.
 Disadvantage:
It is not fully representative of a set of measurements as it is not based
on all the information available.

(b) Grouped data

1. Calculate the cumulative frequencies to position the items in


ascending order.

2. Locate the quartiles classes as:


Q1 = value of n th item
4
3n
Q3 = value of th item
4

3. Find the quartiles using (1) ogive;


(2) linear interpolation formula:

cQ1  n 
Q1  LQ1     f Q11 
f Q1  4 
where LQ1 = lower class boundary of Q1 class
cQ1 = class size of Q1 class
f Q1 = frequency of Q1 class
 fQ1 1 = cumulative frequency of the preceding Q1 class

5
cQ3  3n 
Q3  LQ3     f Q3 1 
f Q3  4 
where LQ3 = lower class boundary of Q3 class
cQ3 = class size of Q3 class
f Q3 = frequency of Q3 class
 f Q3 1 = cumulative frequency of the preceding Q3 class

Example
The following frequency distribution shows the daily production level in a
production line.
Production (units) Number of days
13 – 17 2
18 – 22 22
23 – 27 10
28 – 32 14
33 – 37 3
38 – 42 4
43 – 47 6
48 – 52 1
Calculate the quartile deviation using
(a) the linear interpolation method;
(b) an ogive.

Solution:
Production (units) Number of days Cumulative frequency
0
13 – 17 2 2
18 – 22 22 24
23 – 27 10 34
28 – 32 14 48
33 – 37 3 51
38 – 42 4 55
43 – 47 6 61
48 – 52 1 62

6
62
n = 62 Q1 = value of th item = value of 15.5 th item
4
Q3 = value of th item = value of th item

(a) Q1 class boundaries: 17.5 – 22.5


5
Q1 = 17.5  (15.5  2) = 20.57 units
22

Q3 class boundaries:
Q3 =

Quartile deviation =

(b)
Cumulative
'<' ogive: Production at a production line for a period of 62
frequency
days
70

60

50

40

30

20

10

class boundaries
0
12.5 17.5 22.5 27.5 32.5 37.5 42.5 47.5 52.5
Production (units)

7
From the ‘<’ ogive, Q1 = 20.5 units which shows that 25% of the days are
having production less than or equal to 20.5 units and the other 75% of the
days are having production more than or equal to 20.5 units.

From the ‘<’ ogive, Q3 = 32 units which shows that 75% of the days are having
production less than or equal to 32 units and the other 25% of the days are
having production more than or equal to 32 units.

Q3  Q1 32  20.5
 Quartile deviation    5.7 5 units
2 2

8
Range based on percentiles

Percentiles are the summary measures that divide a ranked data set into 100
equal parts. There are 99 percentiles in a ranked data set.

Consider n items arranged in ascending order. Then,


k
The k percentile, Pk  (n  1) th value
th
100
 P25  Q1 ; P50  Q2 ; P75  Q3

a) Raw data

Example: Find the P10 and P90 for the following raw data.

63, 105, 30, 43, 53, 73, 65, 77, 89, 70, 68, 47, 38, 34, 41, 80, 60, 54, 59

Solution:
Arrange the data in ascending order:
30, 34, 38, 41, 43, 47, 53, 54, 59, 60, 63, 65, 68, 70, 73, 77, 80, 89, 105

P10 =

P90 =
b) Ungrouped frequency distribution

Example: Find the P10 and P90 for the following distribution.
Marks 10 20 30 40 50 60
Number of students 3 9 20 8 5 4
Cumulative frequency 3 12 32 40 45 49
Solution:
P10 =
P90 =

9
c) Grouped frequency distribution
CPk  nk 
Pk  LPk  
f Pk  100
  f Pk 1 

where
LPk = lower class boundary of the percentile class

CPk = the size of the percentile class

f Pk = frequency of the percentile class

f = the cumulative frequency in the class before the percentile


Pk 1
class
n = total frequency

Example
The following frequency distribution shows the daily production level in a
production line. Compute P10 and P90 using (a) formula and (b) ogive.
Production (units) Number of days
13 – 17 2
18 – 22 22
23 – 27 10
28 – 32 14
33 – 37 3
38 – 42 4
43 – 47 6
48 – 52 1

Solution:
Production (units) Number of days Cumulative frequency
0
13 – 17 2 2
18 – 22 22 24
23 – 27 10 34
28 – 32 14 48
33 – 37 3 51
38 – 42 4 55
43 – 47 6 61
48 – 52 1 62

10
10(62)
n = 62 P10 = value of th item = value of 6.2 th item
100

P90 = value of th item = value of th item

(a) P10 class boundaries: 17.5 – 22.5


5
P10 = 17.5 + (6.2 − 2) = 18.4545 units
22

P90 class boundaries:


P90 =
(b)
Cumulative
'<' ogive: Production at a production line for a period of 62
frequency
days
70

60

50

40

30

20

10

class boundaries
0
12.5 17.5 22.5 27.5 32.5 37.5 42.5 47.5 52.5
Production (units)

From the ‘<’ ogive, P10 = units and P90 = units.

11
Standard Deviation

 Standard deviation is the root-mean-square deviation between the


individual values and the mean in a distribution.

 Consider a set of data: xn


x1 , x2 , …,

Let mean of the data be: x


The deviations of each value from
x1  x , x2  x , …, xn  x
the mean:

square deviations:
x1  x 2 , x2  x 2 , …, xn  x 2
mean-square deviation:  x  x 
2

n
root-mean-square deviation:  x  x 
2

Computation of the standard deviation:

(a) Raw data

Population standard deviation, Sample standard deviation,


 x     x  x 
2 2
 s
N n 1

Alternatively:

Population Standard deviation, Sample standard deviation,


 x2   x 
2
 x
2

   x 2

N  N  s n
n 1

12
 The standard deviation computed from population data is denoted by the
symbol  (pronounced as sigma); the standard deviation computed from
sample data is denoted by s.
Variance

 Variance is the mean-square deviation between the individual values and


the mean in a distribution.

 Variance is also called the square of the standard deviation in a


distribution.

 In general, it is difficult to interpret the meaning of the value of variance


because the units are squared values. Hence, standard deviation is
more frequently used.

Example
Find the standard deviation and variance for the following data:
2, 12, 7, 5, 9

Solution
N=5
∑x = 2 + 12 + 7 + 5 + 9 = 35
∑x2 = 22 + 122 + 72 + 52 + 92 = 303
Population standard deviation,
 x2   x 
2 2
303  35 
        11.6  3.41
N  N  5  5 
Population variance,  2  3.412  11.6

Example
During a particular summer month, the number of central air-conditioning
units sold by a random sample of 5 salespersons from a heating and air-
conditioning firm were as follows:
8, 11, 5, 12, 8

Find the sample standard deviation and the sample variance.

13
Solution
n = 5,
∑x =
∑x2 =

Sample standard deviation,

x 2

 x
2

s n
=
n 1

Sample variance, s2 =

(b) Grouped data

Population standard deviation, Sample standard deviation,


 f x     f x  x 
2 2

 s
f n 1
Where n = f
Alternatively:
Population Standard deviation, Sample standard deviation,

 fx 2   fx 
2
 fx 2

 fx 
2

    n
f   f  s
n 1

Example
Find the mean and standard deviation of the following frequency distribution.
Class interval Frequency
0–6 2
6 – 12 4
12 – 18 10
18 – 24 12
24 – 30 8
30 – 36 4

14
Solution
Class interval f Class mark, x fx fx2
0–6 2 3 6 18
6 – 12 4 9 36 324
12 – 18 10 15 150 2250
18 – 24 12 21 252 5292
24 – 30 8 27 216 5832
30 – 36 4 33 132 4356
Total 40 792 18072
 fx 792
Population mean,     19.8
f 40
Population standard deviation,
2
 fx 2   fx 
2
18072  792 
        59.76  7.73
f   f  40  40 

Example
The output distribution for a sample of 100 workers in BB Company is shown
below:
Output (units) Number of workers
21 – 25 10
26 – 30 35
31 – 35 16
36 – 40 14
41 – 45 12
46 – 50 10
51 – 55 3
Calculate the mean and the standard deviation.

Solution
Output (units) f Class mark, x fx fx2
21 – 25 10 23 230
26 – 30 35 28 980
31 – 35 16 33 528
36 – 40 14 38 532
41 – 45 12 43 516
46 – 50 10 48 480
51 – 55 3 53 159
Total 100 3425

15
 fx 3425
Sample mean, x    34.25 units
f 100

Sample standard deviation,

 fx 2 
 fx 2
s n 
n 1

COEFFICIENT OF VARIATION (CV)


 Absolute measures of dispersion, e.g. standard deviation measuring the
degree of dispersion within a series or a distribution.

 Relative measures of dispersion is used to compare the variability


between two or more different distributions.

 The relative measure of dispersion is a measure of dispersion expressed


as percentage of a measure of location. The commonly used relative
measure of dispersion is
𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛
𝐶𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑜𝑓 𝑣𝑎𝑟𝑖𝑎𝑡𝑖𝑜𝑛 = × 100
𝑚𝑒𝑎𝑛

Example
Distribution 1 Distribution 2
Standard deviation 27 km RM 4.6
Mean 450 km RM 10
Coefficient of variation 6% 46%

Thus, the values in distribution 2 are more variable than the values in
distribution 1.

Example

Typist A can type 50 words per minute with standard deviation of 5 while
typist B can type 150 words per minute with standard deviation of 10. Which
typist is more consistent in her work?

16
Solution
The standard deviation of typist B is twice of typist A. B can type three times
the speed of A. Taking into consideration all the information, the coefficient
of variation is used.
5
𝐶𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑜𝑓 𝑣𝑎𝑟𝑖𝑎𝑡𝑖𝑜𝑛 𝑓𝑜𝑟 𝐴 = × 100 = 10%
50
10
𝐶𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑜𝑓 𝑣𝑎𝑟𝑖𝑎𝑡𝑖𝑜𝑛 𝑓𝑜𝑟 𝐵 = × 100 = 6.67%
150
The results show that the typing ability of typist B is more consistent than
typist A.

 When making comparison, rule of the thumb is that the larger the
percentage, the greater is the relative variation.

 A larger relative variation implies less consistency, while a small relative


variation implies more consistency, respectively.

Coefficient of Skewness
 The term skewness is used to describe the shape of a frequency
distribution.
 If the histogram of a frequency distribution is drawn, the distribution is said
to be skewed if the peak of the histogram lies to either side of the centre
of the distribution. The terms positive and negative skewness are used to
describe the direction of the skewness.
 If the mean = mode = median, the distribution of data is said to be
symmetrical else asymmetrical or skewed

17
 2 types of asymmetrical frequency distribution:

i) Positive skewed distribution ii) Negative skewed distribution


Mean > median > mode Mean < median < mode
Tail stretches out to the right Tail stretches out to the left

 Degree of skewness, Pearson’s coefficient of skewness

3( Mean  Median )
Sk 
SD
For population data, For sample data,
3(   ~) 3( x  x)
Sk  Sk 
 s
Range of S k is [-3, 3]
For symmetrical distribution, S k  0

Example:
If the x  30.9 , x  28.8 and S  13.23 . Find S k and interpret the value.

Solution:
3(30.9 − 28.8)
𝑆𝑘 = = 0.4762
13.23

The distribution is slightly skewed to the right of 0.4762.

18
BAMS1743 QUANTITATIVE METHODS
TUTORIAL 3 (Measures of dispersion)

1. A manager observes the amount of time taken by his secretary to


prepare a sample of 10 business letters in the office and the results are
arranged in ascending order to the nearest minute:
5, 5, 5, 7, 9, 14, 15, 15, 16, 18.
Determine

(a) the range,


(b) P35 and P60,
(c) the quartiles,
(d) the quartile deviation.

2. The following array shows the amounts spent (in RM) by a random
sample of 15 students at a primary school canteen:
1.50, 1.50, 1.80, 1.80, 1.80, 1.90, 1.90, 2.50, 2.90, 2.90, 3.40,
3.50, 3.80, 4.00, 4.10.
Determine
(a) the quartiles,
(b) the inter-quartile range,
(c) the standard deviation.

3. The number of successful sales made by all the salesmen employed


by a large micro-computer firm in a particular quarter.
Number of sales Number of salesmen
0–4 1
5–9 14
10 – 14 23
15 – 19 21
20 – 24 15
25 – 29 6
(a)(a)Calculate the quartile deviation and standard deviation of the
number of sales.

(b) Compare and contrast the quartile deviation and standard


deviation.

19
4. A company owns two garages, A and B. In garage A, a representative
sample of 200 consumers’ purchases was taken. The results were as
follows:

Petrol purchased (gallons) Number of consumers


0 and < 2 15
2 and < 4 40
4 and < 6 65
6 and < 8 40
8 and < 10 30
10 and < 12 10

(a)(a)Calculate the mean and standard deviation of the number of


gallons purchased.

(b) A similar sample of garage B users showed a mean of 4 gallons


with a standard deviation of 2.2 gallons. In which garage were the
purchases of petrol relatively more variable?

5. The following table shows the projected population for males and
females in a village.

Age group Males Females


0 – < 15 75 72
15 – < 30 59 58
30 – < 45 46 47
45 – < 60 43 48
60 – < 75 31 41
75 and over 6 14
Total 260 280

(a) Calculate the median, mean and standard deviation for each
distribution.

(b) Obtain the coefficient of variation for each distribution and


comment on the results.

(c) Obtain the coefficient of skewness for each distribution and


interpret the results.

20
6. (a) Explain the meaning of absolute and relative measures of
dispersion and compare their use.

(b) In the production department of a firm, the average weekly


earnings are RM366 with a standard deviation of RM28.2.
Calculate the coefficient of variation.

(c) In the administrative department, the earnings (in RM) in a certain


week of all the 10 members are as follows:
237 245 283 296 253 249 236 254 305 242
Calculate the mean and the standard deviation of this department.
Calculate the coefficient of variation.

(d) Compare the variability of earnings of employees between the two


departments.

7. A survey of house prices yielded the following information:


Price of houses (RM’000) Number of houses for sale
Below 50 16
50 and under 60 41
60 and under 70 39
70 and under 80 22
80 and under 90 10
90 and under 100 11
100 and under 110 4
110 and under 120 5
120 and under 130 1
130 and under 140 5
140 and under 150 4
150 and under 160 2
Total 160
(a) Construct a “less than” cumulative frequency distribution and draw
a “less than” ogive.
(b) From the ogive, estimate the median, the lower quartile, the upper
quartiles and the quartile deviation.
(c) The government considers to rebate stamp duty for the 10% most
expensive houses, estimate the cut-off point for the prices.

21
Answers:
1. (a) 13 min (b) 6.7 min, 14.6 min (c) 5 min, 15.25 min (d) 5.125 min
2. (a) RM 1.80, RM 3.50 (b) RM 1.70 (c) RM 0.95
3. (a) 10.587 sales, 19.833 sales, 4.62 sales, 6.07 sales
4. (a) 5.6 gallons, 2.58 gallons. (b) Garage B (CV = 55%)
5. (a) Males: 28.98 years, 32.54 years, 21.85 years; Females: 33.19 years,
35.89 years, 23.39 years (b) Males: CV = 67.2%; Females: CV =
65.2%. (c) Males: SK = 0.4888; Females: SK = 0.3463.
6. (b) CV = 7.70% (c) RM260, RM23.9 (d) Admin Dept: CV = 9.19%; hence
the earnings in administration department are more variable than the
earnings in production department.
7. (b) 66 (RM’000), 56 (RM’000), 82 (RM,000), 13 (RM’000) (c) Prices  112
(RM’000)

22

You might also like