You are on page 1of 46

Illustration

Let us calculate A.D. from Table 10.3


Array of Marks Obtained by Students
Marks Deviation
Item No. (Student) obtained from mean
(X) (lxl = X - X)
1 50 21
2 51 20
3 55 16
4 59 12
5 68 3
6 70 1
7 75 4
8 80 9
9 86 15
10 92 21
11 95 24
N = 11 ∑X = 781 ∑ lxl = 146

X=
∑ X = 781 =71
N 11
A.D. =
∑ lxl = 146 =13.3≈.
N 11

10.5.4 Average Deviation From Airthmetic Mean (Grouped Data)


In case of (grouped data) or continuous series, we have to take an additional step. We multiply
deviations from X by the corresponding frequency.
There are two methods of calculating average deviation from grouped data. One is the usual long
method in which deviations are taken from actual mean, i.e. X The other is also the usual shortcut
method in which deviations are taken from the assumed mean. We will explain here only the long
method. Short cut method will also give the same result.
Method
The main steps in this method are:
(1) Find out mid point of each class.
(2) Calculate X .
(3) Find out Ixl, i.e., deviations from X (= X - X ).
(4) Multiply lxl by frequency (f)
(5) Take the sum of f lxl
(6) Divide the sum of f Ixl, i.e., ∑lxl, by N.
Illustration
TABLE 10.4
Frequency Distribution of Marks Obtained by Students
Mid-point Frequency x Deviation Frequency
Class Frequency
of class mid point from mean x Deviation
f X fX lxl ∑flxl
40-50 12 45 540 23 276
50-60 18 55 990 13 234
60-70 23 65 1495 3 69
70-80 27 75 2025 7 189
80-90 15 85 1275 17 255

90-100 5 95 475 27 135


N = 100 ∑fX = 6800 ∑f lxl = 1158

X=
∑∫ X = 6800 =68
N 100

A.D = ∑∫ lxl = 1158 =11.58∨11.6≈.


N 100

How do we interpret the value of A.D.


The value of A.D. in our illustration is 11.6.
The value of X is 68. A.D. here means that average marks differ from the mean of 68 by 11.6
marks.

Solved Examples

Example 1.
Calculate average deviation from the data of marks obtained by 10 students.
90, 64, 79, 33, 85, 59, 60, 70, 40, 95
Solution :
S.No. Marks Deviation from Mean
(X) x = (X - X )

1 90 22.5
2 64 3.5
3 79 11.5
4 33 34.5
5 85 17.5
6 59 8.5
7 60 7.5
8 70 2.5
9 40 27.5
10 95 27.5

∑X = 675 ∑lxl = 163.0


675
X= =67 .5
10

A . D=
∑ lxl = 163 =16.3
N 10
Example 2.
Find out average deviation from the following items
8, 9, 12, 15, 17, 20, 24
Solution :
S.No. Size Deviation from mean
(X) (x) = X - X
1 8 7
2 9 6
3 12 3
4 15 0
5 17 2
6 20 5
7 24 9
∑X = 105 (∑x) = 32
105
X= =15
7

A.D. =
∑ lxl = 32 =4.57
N 7
Example 3.
Find average deviation from the following :
Wage (₹) 0-100 100-200 200-300 300-400 400-500
No. of workers 20 30 35 45 20

Solution :
Class Mid-point Frequency Frequency x Deviation Frequency x
from mean deviation
mid point fA ∑f lxl
X f Ixl
0 - 100 50 20 1000 210 4200
100 - 200 150 30 4500 110 3300
200 - 300 250 35 8750 10 350
300 - 400 350 45 15750 90 4050
400 - 500 450 20 9000 190 3800
N = 150 ∑fX = 39000 ∑f Ixl = 15700

Example 4.
Find average deviation from the following:

Marks 0-20 20-40 40-60 60-80 80-100


No. of students 12 21 31 22 14
Solution :
Class Mid-point Frequency Frequency Deviation from Frequency x
x mid-point fX mean Ixl deviation ∑flxl
X F
0 - 20 10 12 120 41 492
20 - 40 30 21 630 21 441
40 - 60 50 31 1550 1 31
60 - 80 70 22 1540 19 418
80 - 100 90 14 1260 39 546
N = 100 ∑fX = 5100 ∑flrl = 1928

X=
∑ fX = 5100 =51 marks
N 100
A . D .=
∑ flxl = 1928 =19.28 marks
N 100

10.5.5 Average Deviation From Median


The method of calculating average deviation from median is the same as of calculating from
arithmetic mean. There is one difference. Average deviation is lower when calculating from median as
compared to when calculated from arithmetic mean.
(a) Average Deviation From Median (Ungrouped data)
Consider the following example.
Item No. Marks Deviation from Median (= 70) l(d)l
(X)
1 50 20
2 51 19
3 55 15
4 59 11
5 68 2
6 70 0
7 75 5
8 80 10
9 86 16
10 92 22
11 95 25
∑lxl = 145
N +1 10+1
Med = 2 th item= 2 =6 th item=70 marks

A.D. ¿
∑ d = 145 =13.15≈.
N 11

(b) Average Deviation from Median (Grouped data


Consider the following example
Class Frequency Mid-point Cumulative Deviation from Frequency x
frequency median (= 64) deviation
(f) (ef) l(d)l (df)
40 - 50 12 45 12 19 228
50 - 60 18 55 30 9 162
60 - 70 20 65 50 1 20
70 - 80 30 75 80 11 330
80 - 90 15 85 95 21 315
90 - 100 5 95 100 31 155
1210 (= ∑df)

[ ]
N
−∑ f 1
Med = l1 + 2 × i=60+
50−30
×10
f med 50
20 ×10 200
¿ 60+ =60+ =64
50 50

A . D .=
∑ df = 1210 =12.1
N 100
10.5.6 Characteristics of Average Deviation (AD)
(1) AD is based on all values.
(2) AD is lower when calculated from median as compared to when calculated from
arithmetic mean.
(3) Since AD ignores signs of deviation, it is incapable of being calculated from open
ended series.
10.6 STANDARD DEVIATION
10.6.1 Introduction
Standard deviation (SD) is a special form of average deviation. It is different from average
deviation in two respects. First, in computing standard deviation the average used is always the
arithmetic mean. Second, the deviations from mean are squared. The second step is helpful in avoiding
the problem involved in disregarding signs. Standard deviation is considered as the most important
measure of variation.
10.6.2 Methods
There are two methods of calculating Standard deviation, the long method and the short method. In
the long method the deviations are taken from the actual mean. In the short method the deviations are
taken from the assumed mean. Both the methods are used for ungrouped data and grouped data. Let us
first explain the case of ungrouped data.
10.6.3 Standard Deviation (Ungrouped Data)
Long method
The main steps are:
1. Find out arithmetic mean ( X ).
2. Take deviations from the mean (x).
3. Take the square of deviations (x2).
4. Take the sum of squares (∑x2).
5. Divide the sum of squares (∑r2) by the number of items (N).

6. Extract the square root of

The result is standard deviation.


N √
∑ x 2 , i.e., find out ∑ x 2
N
Thus, standard deviation (σ) =
√ ∑ x2
N

Where, a = standard deviation.


Sometimes Greek letter σ (Sigma) is also used as symbol for standard deviation.
Illustration
Let us calculate standard deviation from data in Table 10.3. The first 3 columns of the table which
were used for computing average deviation are also relevant for computing standard deviation. We
have added the 4th column recording square of deviations (x2).

X=
∑ X = 781 =71 marks .
N 11

Shortcut method
σ=
√ √
∑ x2 =
N
2590
11
=√ 235.4545=15.34≈.

Let us use the same series of data as in Table 10.3 but prepare a fresh table of computation of
standard deviation by shortcut method. In this method, we take deviations from assumed mean instead
of actual mean. The basic method is the same. But since we are taking deviations from the assumed
mean, we also take the correction factor to eliminate the difference between the actual mean and the
assumed mean. The main steps in the method are the following :
1. Choose some value as assumed mean ( X d), preferrably one of the mid points.
2. Take deviations (x) from the assumed mean.
3. Take the square of deviations (x2)
4. Take the sum of squares of deviations (∑x2)
5. Take the sum of deviations from assumed mean (∑x)

6. Divide ∑x2 by N. i.e., find out


√ ∑ x2
N

7. Divide ∑x by N, i.e., find out


∑x
N
∑ x i.e., find out ∑ x
( )
2

8. Take the square of


N N

( ) from ∑ , i.e., find out ∑ - ∑ ( )


2 2

9. Substract ∑x x2 x2 x
N N N N
Let us calculate standard deviation from Table 10.5
TABLE 10.5
Computation of Standard Deviation From an Array of Marks Obtained by Students
Item No. Marks Deviation from assumed Square of
mean (70) deviations
X x = X- X d X2

1 50 -20 400
2 51 -19 361
3 55 -15 225
4 59 -11 121
5 68 -2 4
6 70 0 0
7 75 +5 25
8 80 +10 100
9 86 +16 256
10 92 +22 484
11 95 +25 625
N = 11 ∑x = 11 ∑x2 = 2601
Assumed mean = 70

√ ( )
∑ x2 − ∑ x

2

( )
2
2601 11
σ= = −
N N 11 11

= 15.34 approx.
The value of standard deviation is the same by the short cut method. This method saves us from
computing X , i.e., actual arithmetic mean.
10.6.4 Standard Deviation (Grouped Data)
Introduction
The basic method of computing standard deviation from grouped data is the same. We have to take
an additional step. Since there are classes, we have to first find out the mid-point of each class. This
mid-point is then taken as the representative value of each class.
Like in case of ungrouped data, there are two methods of computing standard deviation from
grouped data : long method and short method. In the long method we take deviations from actual
arithmetic mean ( X ). In the short method we take deviations from the assumed mean ( X d). Let us now
study the two methods.
Long Method
Let us use the data in Table 10.4 as an illustration. The main steps are :
(1) Find out mid-point of each class.
(2) Calculate X , i.e., actual arithmetic mean.
(3) Find out deviation (x) from X .
(4) Take square of x. i.e., (x2).
(5) Multiply x2 by frequency (fx2)
(6) Apply the following formula:

σ=
√ ∑ f x2
N
The above steps are summed up in Table 10.6
TABLE 10.6
Computation of Standard Deviation by Long Method from Frequency Distribution of Marks
Obtained by Students
Class Frequency Mid-point Mid-point x Deviation Square of Square of x
(Marks) (Students) of class frequency from mean deviations x frequency
(68)
f X fX x=X-X X2
40-50 12 45 540 -23 529 6348
50-60 18 55 990 -13 169 3042
60-70 23 65 1495 -3 9 207
70-80 27 75 2025 +7 49 1323
80-90 15 85 1275 +17 289 4335
90-100 5 95 475 +27 729 3645

N = 100 ∑fX = 6800 ∑fx2 = 18900

X=
∑ fX 6800 =68 marks
N 100

s=
√ ∑ f x2 =
N √ 18900
100
= √ 189=13.75 marks≈.

Short Cut Method


In this method instead of taking deviations from actual mean ( X ), we guess some
mean ( X d) and take deviations from this assumed mean. Since the assumed mean is different from the actual
mean, an error will arise. We must correct for the error. The
steps in this method are :
1. Find out mid point of each class.
2. Assume some value as mean, preferably one of the mid points ( X d).
3. Take deviations from assumed mean (x)
4. Take step deviation from assumed mean by dividing x by class interval (i)

i.e., ( x = xi )
'

(5) Multiply x' by frequency i.e., (fxr)


(6) Take square of deviations, i.e., (x'2)
(7) Multiplying x'2 by frequency (fx'2)
(8) Apply the following formula:

√ (
∑ f x' 2 − ∑ f x'
)
2
σ¿ i
N N
Where , i = class interval
( ∑ f x' 2
N )=t h e correction factor .


( ) . The first part is the usual long method. The
2

The above formula has two parts:


f x'2
and ∑ f x'
N N
second part is the correction factor because we have taken deviations from assumed mean ( X d) and not
from actual mean ( X ). If we had taken deviation from X , there was no need of the correction factor. The
result is multiplied by i because in step 4 by step deviation we had divided deviation (x) by i.
Illustration
We will use the same frequency distribution as given in Table 10.5 but recast the whole table
because we now take deviations from some assumed mean.
TABLE 10.7
Computation of Standard Deviation by Short Cut Method for Frequency Distribution of Marks
obtained By students
Class Frequency Mid Deviation Step Step Square of Square of
point from deviation deviation x step step devi-
assumed (x ÷ i) frequency deviation ation X
mean (65) frequency
= 70 2 2
f X x x' fx' x' fx'
40-50 12 45 -20 -2 -24 4 48
50-60 18 55 -10 -1 -18 1 18
60-70 23 65 0 0 0 0 0
70-80 27 75 + 10 +1 +27 1 27
80-90 15 85 +20 +2 +30 4 60
90-100 5 95 +30 +3 + 15 9 45
N = 100 ∑fx' = + 30 ∑fx'2=198
Assumed mean = 65

√ (
∑ f x' 2 − ∑ f x'
) √
2

( )
2
σ¿ i 198 30
=10 −
N N 100 100
= 10 √ 1.98−( 0.3 ) 2

= 10 √ 1.98−0.09=10 √ 1.89
= 10 ×1.375 approx = 13.75 approx.
10.7 SOLVED EXAMPLES
Example 1.
Calculate standard deviation from the data on marks obtained by 10 students:
33, 40, 59, 60, 64, 70, 79, 85, 90, 95
Solution :

S. No. Marks (X) Deviation (x) = X - ( X d) Square of


( X d = 70) deviation (x2)

1 33 -37 1369
2 40 -30 900
3 59 -11 121
4 60 -10 100
5 64 -6 36
6 70 0 0
7 79 +9 81
8 85 +15 225
9 90 +20 400
10 95 +25 625
2
∑X = 675 ∑x = -25 ∑x = 3857
Assumed mean = 70

√ ∑ x2 − ∑ x
( ) √
2

( )
2
3857 −25
σ= = −
N N 10 10

¿ √ 385.7−6.25

¿ √ 379.45=19.48 marks≈.
Example 2.
Calculate standard deviation from the following items.
8, 9, 12, 15, 17, 20, 24
Solution :
S. No. Size (X) Deviation from mean Square of dev.
x=X-X (x2)
1 8 -7 49
2 9 -6 36
3 12 -3 9
4 15 0 0
5 17 +2 4
6 20 +5 25
7 24 +9 81
N=7 ∑X = 105 ∑x2 = 204
105
X= =15
7

σ=
√ N
∑ x 2 =¿
204
7
¿ √ 29.14=5.40≈.
¿

Example 3.
Find standard deviation from the following :
Wage (₹) 0-100 100-200 200-300 300-400 400-500

No. of workers 20 30 35 45 20

Solution :
Class Frequency Mid Mid point Deviation Square of Square of dev. x
point x frequency from mean x deviation frequency
f X fX x=X–X x2
fx2
0-100 20 50 1000 - 210 44100 882000
100-200 30 150 4500 - 110 12100 363000
200-300 35 250 8750 - 10 100 3500
300-400 45 350 15750 + 90 8100 364500
400-500 20 450 9000 + 190 36100 722000

N = 150 ∑fX = 39000 ∑fx2 = 2335000

X=
∑ fX = 39000 =260
N 150

σ=
√ ∑ x 2 =¿
N √ 2335000
150
¿

¿ √ 15566.67=124.77≈.

Example 4.
Calculate standard deviation by shortcut method.
Marks 0-20 20 - 40 40 – 60 60 - 80 80 - 100

No. of students 12 21 31 22 14
Solution :
Class Mid-point Frequency Deviation from Step. Step dev. x Square of Square of step
assumed mean dev. frequency step dev. dev.

(X) (f) (50) x = X- X fx' x'2 x frequency


x
x'=
0-20 10 12 - 40 -2i - 24 4 48

20 - 40 30 21 - 20 -1 - 21 1 21
40 - 60 50 31 0 0 0 0 0
60 - 80 70 22 + 20 +1 + 22 1 22
80 - 100 90 14 + 40 +2 + 28 4 56
N = 100 ∑fx' = + 5 ∑fx'2 = 147

Assumed mean = 50

√ (
∑ f x' 2 − ∑ f x'
) √
2
σ¿ i
N N
= 20
147

5 2
100 100 ( )
=¿ 20 √ 1.47−.0025 ¿

¿ 20 √ 1.4675=20 ×1.211=24.23 marks

Example 5.
Solve example 3 by the assumed mean method.
Solution :
Class Frequency Mid Deviation from Step. Step dev. Square of Square of step
point assumed mean dev. x frequency step dev. dev. x frequency
(f) (X) X x’ fx’ x'2 fx’2
0 – 100 20 50 -200 -2 - 40 4 80
100 – 200 30 150 -100 -1 - 30 1 30
200 – 300 35 250 0 0 0 0 0
300 – 400 45 350 + 100 +1 + 45 1 45
400 – 500 20 450 + 200 +2 + 40 4 80
N = 150 ∑fx' = +15 ∑fx'2 = 235

Assumed mean = 250.

√ (
∑ f x' 2 − ∑ f x'
) √
2

( )
2
235 15
σ¿ i = 100 −
N N 150 150

¿ 100 √ 1.5667−( .1 )
2

¿ 100 √ 1.5667−0.1=100 √ 1.5567


¿ 100 ×1.2477=124.77
II. MEASURES OF RELATIVE DISPERSION

10.8 NEED FOR MEASURES OF RELATIVE DISPERSION


Range, interquartile range, mean deviation and standard deviation are all measure of absolute
variation. There are two limitations of absolute measures which make them incapable of comparison
between two series of data.
percentages, marks, etc. So, if the absolute dispersion of one series is expressed in rupees and that
of another in percentages, the two cannot be directly compared. It is therefore necessary to convert, the
results of the two series into comparable units.
Second, the measures of absolute dispersion are not expressed relative to the magnitude around
which the dispersion is measured. For example, suppose there are two series of marks obtained by
students whose X and σ are :
X σ
Series A 80 10
Series B 40 10
The two series have different means but same standard deviation. This offers no basis of
comparison in itself. If we interpret these results in terms of percentages, the percentage variation in
series A is much less than in the series B. So, we cannot find comparable variation unless the absolute
measure of dispersion is related to some average.
Similarly, if both the series have the same X but different standard deviation, the two cannot be
compared.
The above limitations can be removed by dividing each measure of absolute dispersion by its
respective average. There are four measures of relative dispersion. (1) coefficient of variation, (2)
coefficient of mean deviation (3) coefficient of range and (4) coefficient of quartile deviation. Out of
these the one most often used is coefficient of variation.

10.9 COEFFICIENT OF VARIATION


10.9.1 Measure
This measure of relative dispersion was developed by Karl Pearson. It is represented by symbol “V”.
The formula for it is
σ Standard deviation
V= ×100= ×100
X Arithmetic mean
Illustration 1
Suppose, we are given the following information by two series of data about marks obtained by
students
X (marks) σ (marks)
Series A 80 10
Series B 40 10

Then,
10 1000
V of series A= × 100= =12.5 %
80 80
10 1000
V of series B= × 100= =25 %
40 40

It is clear from the above that variation is relatively much larger in series B than in series A. (Also
note that absolute dispersion is same, in both the series). Thus, the comparison between the two series
can be made only through the use of measures of relative dispersion.
Illustration 2
Now we take an illustration in which arithmetic means are the same but standard deviations differ.
Suppose, following is known about the daily wage of two groups of workers :

X (wage) σ (wage)
x group : 50 5
y group 50 10
σ 5
V of x group= × 100= × 100=10 %
X 50
10
V of y group= ×100=20 %
50

Therefore, variation is more in case y group than in case of x group, even though both the groups
have the same arithmetic mean.
10.9.2 Solved Examples
Example 1.
Find the coefficient of variation from data in Example 1 in section 10.7
Solution:

X=
∑ X = 675 =67 . 5
N 10

σ =19.48

σ 19.48
V= ×100= ×100=28.87 %
X Example67.52.
Find coefficient of variation from data in example 2 in section 10.7.
Solution :
X =15
σ =5.4
σ 5.4
V= ×100= ×100=36 %
X 15
Example 3.
Calculate coefficient of variation from data in example 3 in section 10-7.
Solution :
X =260
σ =124.77
124.77 12477
V= ×100= =47.99 %
260 260

10.10 COEFFICEINT OF RANGE


10.10.1 The measure
Range is the difference between the highest and lowest values in a series. Coefficient of range can
be difined as the ratio between this difference and the sum of these values. Its formula is :
Hig h est value−Lowest value
V R=
Hig h est value+ Lowest value

10.10.2 Solved Examples


Illustration 1.
Refer to the Table 10.1 in which the highest value is 95 marks and the lowest value is 50 marks.
Therefore,
95−50 45
V R= = =0.31≈.
95+50 145

Illustration 2.
Refer to the Table 10.2 in which the upper limit of the class of the highest value is 100 marks and
the lower limit of the class of the lowest value is 40 marks. Therefore,

100−40 60
V R= = =0.43≈.
100+ 40 140

Illustration 3.
Refer to the example 3 in section 10.4.5 in which the upper limit of class of the highet value is ?
500 and the lower limit of the class of the lowest value is 0. Therefore,
500−0 500
V R= = =1
500+0 500
10.10.3 Limitations
First, coefficient of Range is not considered dependable in distributions with extreme values.
Second, it is not possible to calculate coefficient of range in open ended distributions.
10.11 COEFFICIENT OF QUARTILE DEVIATION
10.11.1 Measure
It is defined as the ratio between the difference between the quartiles and the sum of quartiles.
Its formula is
Q3 −Q1
V Q=
Q 3+Q 1

This measure is considered more appropriate in open-end distributions and distribution with
extreme values.

10.11.2 Solved examples


Illustration 1.
Refer to the illustration in Section 10.4.3 in which the values of and Q 3 have been calculated as
55 marks and 86 marks respectively. The value VQ in this case is
Q3 −Q1 86−55 31
V Q= = = =0.22(¿ 22 %)
Q 3+Q 1 86 +55 141

The coefficient can be expressed in decimal form or percentage form.

Illustration 2.
Refer to illustration in section 10.4.4 in which the values of Q 1 and Q3 are 57.3 marks and 78.1
marks respectively. The value of VQ in this case is
Q3 −Q1 78.1−57.3
V Q= = =0.15(¿ 15 %)
Q 3+Q 1 78.1+57.3

10.11.3 Solved Examples


Example 1.
On the basis of data in Example 1 of Section 9.5 of Chapter 9, calculate the value of coefficient
of Quartile Deviation.
Solution:
Q3 −Q1 86.25−54.25 32
V Q= = = =0.23(¿ 23 % )
Q 3+Q 1 86.25+ 54.25 140.5
Example 2.
On the basis of data in Example 2 of Section 9.5 of Chapter 9, calculate the value of coefficient
of Quartile Deviation.

Solution :
Q3 −Q1 20−9 11
V Q= = = =0.38(¿ 38 % )
Q 3+Q 1 20+ 9 29

Example 3.
On the basis of data in Example 3 of Section 9.5 of Chapter 9, calculate the value of coefficient
of Quartile Deviation.
Solution:
Q3 −Q1 359.375−154.170 205.205
V Q= = = =0.40(¿ 40 %)
Q 3+Q 1 359.375+154.170 513.545

Example 4.
On the basis of data in Example 4 of Section 9.5 of Chapter 9, calculate the coefficient of
Quartile Deviation.
Solution:
Q3 −Q1 68.89−34.12 34.77
V Q= = = =0.34 (¿ 34 %)
Q 3+Q 1 68.89+ 34.12 103.01

Example 5.
On the basis of data in Example 5 of Section 9.5 of coefficient of Quartile Deviation. Chapter 9,
calculate the

Solution:
Q3 −Q1 70.830−34.375 36.375
V Q= = = =0.35(¿ 35 %)
Q 3+Q 1 70.830+34.375 105.205

10.12 COEFFICIENT OF AVERAGE DEVIATION (VAD)


10.12.1 Measure
It is also called Coefficient of Average Deviation. It is defined as the ratio of average deviation
to mean or the median. It is also named as coefficient of mean deviation.
Average deviation A . D .
V A .D .= =
mean Mean
Illustration 1.
Refer to the illustration in Section 10.5.3 where X =71 and A.D. = 13.3. The value of VAD in this
A . D . 13.3
case is : V A .D .= = =0.19 (¿ 19 %)
X 71
Illustration 2:
Refer to the illustration in section 10.5.4 where X =68 and A.D. = 11.6 The value of VAD in this case
is:
11. 6
V A .D .= =0 . 17 (¿ 17 %)
68

Refer to the illustration in section 10.5.5 where median = 64 and A.D. = 12.1. The value of VAD in
this case:

A . D . 12. 1
V A .D .= = =0 . 19(¿ 19 %)
Median 64
Solved Examples
Example 1.
On the basis of data in Example 1 of Section 10.5.5 in chapter 10, calculate the coefficient of
Mean Deviation.
A . D . 16.3
Solution : V A .D .= = =0.24 (¿ 24 %)
X 67.5
Example 2.
On the basis of data in Example 2 in Section 10.5.5 of Chapter 10, calculate coefficient of Mean
Deviation.
A . D . 4.57
Solution : V A .D .= = =0.30(¿30 % )
X 15
Example 3.
On the basis of data in Example 3 in Section 10.5.5 in chapter 10, calculate coefficient of Mean
Deviation.
A . D . 104.67
Solution : V A .D .= = =0.40(¿ 40 % )
X 260
Example 4.
On the basis of data in Example 4 of Section 10.5.5 in chapter 10, calculate coefficient of Mean
Deviation.
A . D . 19.28
Solution : V A .D .= = =0.38 (¿ 38 %)
X 51

LORENZ CURVE

10.13.1 Introduction
Lorenz Curve is a type of measure of dispersion, but more of a graphical measure. It shows how
a distribution deviates from equal distribution. The traditional techniques of dispersion are statistical
techniques. The Lorenz curve is different in the sense that it is a graphical technique.
10.13.2 Meaning
Lorenz Curve shows the extent of departure between an “equal distribution” and the “actual
distribution” of a variable. When applied to income distribution, it shows the extent of departure of
actual distribution of income from an equal distribution. It is known as the best summary measure of
inequality.
When each household gets the same income, the income distribution is said to be equal. But, in
the real world, the income distribution is never equal. Since the circumstances in which households
are placed differ, the income earned by them also differ. Limited inequality is understandable. Large
scale inequality, however, is undesirable. This is why governments take steps to reduce inequality.
To do so they must know the extent of departure from equal distribution. Lorenz Curve technique is
actively used for this purpose.

10.13.3 Steps in drawing


The following steps are taken to draw the curve.
1. Arrange the households in ascending order of income. We start with poorest household and
end with the richest household.
2. Convert the distribution into percentage distribution. We select a class interval, say 5%,
10%, etc. The poorest, say 20% make the first group. The next 20% in the ascending order
make the second group. In this way the richest 20% makes the last group.
3. Express the distribution in cumulative percentages.
As an example, we take distribution of national income in India for the years 2004 and 2015
(Hypothetical data).
TABLE 10.8 :
Distribution of National Income in India
Share of National Income (%)
Income groups 2004 2015
% cum.% % cum.%
Poorest 20% 7.0 7.0 8.1 8.1
Next 20% 9.2 16.2 11.6 20.7
Next 20% 13.9 30.1 15.0 34.7
Next 20% 20.5 50.6 19.3 54.0
Richest 20% 49.4 100.0 46.0 100.0
All 100.0 100.0 100.0 100.0

In the next step, represent the distribution graphically. Represent households on the x-axis and
income on the y-axis. Make the diagram a square box diagram. Join the lower left hand corner and
the upper right hand corner with a straight line.
Let the box be ABCD. The diagonal AC then joins the two comers. Now draw the actual
distributions which will lie somewhere between AC and ABC.
Interpretation
Note that there are four curves described as follows :
AC : Represents equal distribution, i.e. perfect equality.
ABC : Represents perfect inequality with all theincome with last group.
AEC : Actual distribution for the year 2015.
AFC : Actual distribution for the year 2004.
The curves AC and ABC are simply two ‘never to exist’ extremes, AC shows perfect equality,
while ABC shows perfect inequality. Neither of the two situations is possible in the real world. The
actual distribution must lie somewhere between AC and ABC. Of the two, AC curve is the reference
curve. More is the actual distribution curve away from the equal distribution curve, higher the
degree of inequality.
10.13.4 The Gini Coefficient of Inequality
The Lorenz curve is useful only in ranking inequality between different nations and over
different time periods within the same nation. However, the curve cannot be used to measure the
precise degree of inequality. We can do it with Gini coefficient of inequality. In terms of Lorenz
diagram, it is the numerical value of the area between the Lorenz curve and the diagonal line of
equal distribution, divided by the entire area beneath the diagonal line.
Inequality area
Gini coefficient =
Triangle area
With reference to the year 2015:
Area enclosed by AEC
Gini coefficient =
Area enclosed by ABC
The value of coefficient may vary 0 to 1. In case of equal distribution, it will be zero. In case of
perfect inequality it is 1. In case of actual distribution it will lie between 0 and 1.

POINTS TO REMEMBER
• Dispersion means distance from the average.
• Dispersion is of two types : absolute and relative dispersion.
• Main measures of absolute dispersion (i) range, (ii) semi-inter quartile range (quartile
deviation), (iii) average deviation and (iv) standard deviation
• Main measures of relative dispersion are:
(i) Coefficient of quartile deviation
(ii) Coefficient of average deviation
(iii) Coefficient of variation
• Range in the ungrouped data is the difference between the highest and the lowest values in
the series.
• Range, in grouped data, is the difference between the upper limit of the class of the highest
value and the lower limit of the class of the lowest value.
• Semi-inter quartile range (quartile deviation) equals the difference between third and first
quartile divided by 2.
• The two characteristics of semi-inter quartile range are :
(i) Not affected by extreme values
(ii) Possible to find value in case of open end distribution
• Average (mean) deviation (AD) is the average distance (ignoring pluses and minuses) of the
items of a series from their average (arithmetic mean or median). The method of calculation
of AD is the same whether calculated from arithmetic mean or median. The answers may be
different.

 AD, in ungrouped data =


∑ lxl
N

 AD, in grouped data =


∑ lxl
N
 Standard deviation (SD) is a special form of A.D.

 SD in ungrouped data by long method =


√ ∑ x2
N

√ ( )
∑ x2 − ∑ x
2

 SD in ungrouped data by shortcut method =


N N

 SD in grouped data by long method =


√ ∑ fx2
N

√ ( )
' 2
 SD in grouped data by shortcut method = ∑ x , 2 − ∑ fx
N N

 There is need for a measure of relative dispersion on account of two limitations of measures
of absolute dispersion : (i) The measures of absolute dispersion are not comparable because
they may be expressed in different units and (ii) a measure of absolute dispersion s not
expressed relative to the magnitude around which the dispersion is measured.
 Coefficient of variation (V) is the most often used measure of relative dispersion.
Standard deviation s
V= ×100= ×100
Arithmetic mean X

Highest value−Lowest value


 Coefficient of range = VR = Highest value+ Lowest value
Q3−Q1
 Coefficient of Quartile Deviation is: VQ =
Q3 +Q1
A .D. A. D.
 Coefficient of Average Deviation is: VA.D.= X or VA.D. = Median
 Lorenz curve shows how actual distribution deviates from equal distribution.
 ‘Gini coefficient’ measures the precise degree of deviation of actual distribution from equal
distribution.
EXERCISES
MULTIPLE CHOICE QUESTIONS [1 Mark]
(Answers at the end of exercises)
Choose the correct alternative in the following questions :
1. Quartile deviation is not affected by the extreme values because it is based on :
(a) 25% of the central value of the series
(b) 50% of the central value of the series
(c) 75% of the central value of the series
(d) 80% of the central value of the series
2. Average deviation can be calculated from
(a) Mean (b) Median
(c) Both mean and median (d) Neither mean nor median
3. Standard deviation is calculated from
(a) Mean (b) Median
(c) Both mean and median (d) Neither mean nor median
4. If the lower limit of the class of the lowest value is zero, the value of coefficient of range is :
(a) Zero (b) 0.50
(a) 0.75 (d) 1.00
5. If Q1 = 10 and Q3 = 30, the value of the Coefficient of Quartile Deviation is :
(a) 0.25 (b) 0.50 (c) 0.33 (d) 0.75
6. In case of perfect inequality, the value of Gini Coefficient will be :
(a) Zero (b) Infinity (c) 1 (d) -1
SHORT ANSWER QUESTIONS-I [3 Marks]
Answer the following questions in about 60 words.
1. How is range calculated for grouped and ungrouped data?
2. What is the main disadvantage of range?
3. State briefly the method of calculating average deviation from grouped data.
4. Describe briefly the method of calculating standard deviation in ungrouped data.
5. Who developed the measure of coefficient of variation? State the method.
6. Explain the method of calculating coefficient of quartile deviation.
7. Name the steps in drawing Lorenz Curve.
8. Explain the method of calculating the coefficient of range.
SHORT ANSWER QUESTIONS-II [4 Marks]
Answer the following questions in about 70 words.
1. Describe the shortcut method of calculating the standard deviation.
2. Explain the need for the measures of relative dispersion.
3. the
4. Find Why is quartile deviation
semi-interquartile notcoefficient
range, a suitable measure
of range of
anddispersion? Whatofare
the coefficient its alternatives
quartile deviationand
why?
of the following series :
4. Explain the need for the measures of relative dispersion.
5. Why Ageis (years)
quartile deviationNo.not of personsmeasure of dispersion? What are its alternatives and
a suitable
why? 18 5
6. Explain19 the need for measures of6relative dispersion.
7. Explain20 characteristics of range. 7
8. Explain21 the method of calculating8 coefficient of mean deviation.
22 9
9. Explain the measure of ‘Gini Coefficient.’
23 8
24 NUMERICAL
7 QUESTIONS
range and the coefficient 5of range of the following =series
1. Find the 25 (Ans. Q.D. 1.5, V
: Q = 0.07, VR = 0.16)
40, 38, 52,the
5. Calculate 34,coefficient
62, 54, 42,of65range and quartile deviation and (Ans. Range =from
its coefficient 31, Vthe
R =following:
0.31)
2. Exchange rate of U.S. dollar vis-a-vis Indian rupee (i.e. 1 U.S. dollar = so many Rs) from April to
Dec. 2002 is given below. Find the range and the coefficient of range.
MarksMonth No.Exchange
of studentsrate (₹)
0 2
April 48.918
1 3
2 May 4 48.997
3 June 6 48.967
4 10
July 48.764
5 12
6 August 1048.585
7 Sept. 6 48.440
8 Oct. 1 48.371
9 (Ans. Q.D. = 1.5, VQ = 0.33, VR=1)
2
Nov.
6. Find the semi-interquartile 48.255 of quartile deviation of the frequency
range and coefficient
Dec.
distribution of Literacy 48.141
rates in States/UT (Ans. Range = ₹ 0.856, VR = 0.009)
in India during 2001.
3. Given below is frequency distribution of state-wise literacy rate in India during 2001. Find the
Literacyrates (%) of range
range, coefficient No. of quartile
and States/UTdeviation.
47 - 51 Literacy rate1 (in per cent) No. of States/UTs
51 - 55 2
55 - 59 47 - 158 4
59 - 63 3
58 - 69 11
63 - 67 4
69 - 80 8
67 - 71 8
71 - 75 80 - 291 9
75 - 79 Total
2 32
79 - 83 6
83 - 87 (Ans. Range = 44%, Q.D. = 9.61, VR = 0.16)
0
(Ans. Q.D. = 7.84, VQ = 0.11)
87 - 91 3
8. The following are the rent of 12 houses. Calculate mean deviation.
500, 525, 470, 400, 535, 475, 460, 570, 620, 425, 590, 490. (Ans. AD = 52.5)
9. Calculate mean deviation and coefficient of mean deviation of the differences of the age
between husband and wife in a particular community.

Differences (in years) Frequency


0-5 440
5-10 700
10 - 15 500
15 - 20 280
20 - 25 100
25 - 30 50
30 - 35 15
35 - 40 5 (Ans. AD = 5.30, VAD = 0.51)
10. Calculate average deviation and coefficient of average

deviation.

(Ans. A.D. = 0.91, VAD = 0.13)


11. Given the height (in cm) of 10 persons, calculate standard deviation and coefficient of
variation:
170, 165, 150, 154, 163, 169, 155, 153, 164, 168 (Ans. σ = 7.52, V = 5%)
12. Given the marks of ten students, calculate standard deviation and coefficient of variation:
70, 80, 90, 85, 65, 55, 75, 84, 97, 59 (Ans. σ=12.98, V = 17.08%)
13. Find out standard deviation and coefficient of variation.

Variable Frequency
0-5
2
5-10 5
10 – 15 7
15 – 20 13
20 – 25 2
25 – 30 16
30 – 35 8
35 - 40 3 (Ans. σ = 9.25, V = 42.63%)

7. Calculate the average deviation from the


Size of item arithmetic mean and coefficient of mean
Frequency
deviation : 20, 22, 27, 30,3-4
31, 32, 35, 40, 45, 48 (Ans.
3 A.D. = 7.2, VAD = 0.22)
4-5 7
5-6 22
6-7 60
7-8 85
8-9 32
9-10 8
14. Calculate the standard deviation and coefficient of variation.
Marks No. of students
0 – 10 5
10 -20 10
20 – 30 20
30 – 40 40
40 – 50 30
50 – 60 20
60 – 70 10
70 - 80 4
(Ans. σ = 15.68, V = 39.80%)
15. Calculate the standard deviation and coefficient of variation of age of member of a society.
Age No. of members
20 - 30 3
30 – 40 61
40 – 50 132
50 – 60 153
60 – 70 140
70 – 80 51
80 - 90 2
(Ans. σ = 11.87, V = 21.69%)
16. Calculate the standard deviation and coefficient of variation of pocket money received by
students.
Pocket money (₹) No. of students
Below 5 6
Below 10 16
Below 15 28
Below 20 38
Below 25 46
17. The mean and standard deviation of two series X and Y are:
Series X Series Y
Mean 50 80
S. D. 10 20
Which series shows lower variation?
(Ans. Vx = 20%, Vy = 25%)
18. The mean and standard derivation of marks obtained by section A and B are:
Series A Series B
Mean 20 21
S. D. 6 9
Which series shows lower variation?
(Ans. VA = 30%, VB = 42.86%%)
Answer to the multiple choice Questions:
1. (b) 2. (c) 3. (a) 4. (d) 5. (b) 6. (d)
CHAPTER: 11 CORRELATION
11.1 MEANING AND SIGNIFICANCE
11.2 11.1.1 Meaning
The word ‘correlation’ means association between two or more variables. The measure of
correlation, called correlation coefficient, expresses this relationship in quantitative terms.
Correlation coefficient is a measure of degree of association only. In no way does it indicate that
there exists some causal relationship between the variables. Two variables may appear to be
correlated, but, in fact, there may not be any significant relation or connection between the two. If
there is any cause and effect relationship between two variables, it is to be established independently
of the coefficient of correlation. The coefficient at the most may be taken as a pointer of relationship.
It is not a proof of the relationship.
Correlation between variables may be just accidental. Population and national income of a
country may be both rising over a period of time. But, it may be difficult to infer from this that
national income is rising because population is rising, or the other way round that population is
rising because national income is rising. There may be some relationship between the two but the
correlation measure in itself is not sufficient to indicate it. The relationship has to be established
through some logical reasoning or through some other variables.
To give meaning to a measure of correlation it is necessary to establish some causal relationship
between the variables. You must have heard about an important law in economics, called the law of
demand. In this law, a relationship has been established between change in price of a good and its
quantity demanded. The law states that in case of most of the goods, when price changes, demand
changes in the opposite direction. In other words, when price of a good falls, the market demand for
that good rises, or when price rises, demand falls. The law states that there is inverse relation
between price and demand. In terms of correlation we can say that there is negative correlation
between change in price and the consequent change in demand. When we calculate the coefficient of
correlation in this case we will find its value to be negative.
Inverse relation between change in price and change in demand does not mean that there is also
the inverse relation between change in demand and the consequent change in price. In fact, when
demand rises, all other things remaining the same, the price also rises. Why this difference? To
understand this we must make a distinction between cause and effect. When we say that there is
inverse relation between price and demand, we take price as the cause and demand as the effect. The
value of coefficient of correlation in this case would be negative. When we say that there is direct
and positive relation between demand and price, we take demand as a cause and price as the effect.
The value of correlation coefficient in this case would be positive. Thus, while analysing a
correlation measure, we must keep in viewthat the “cause and effect” relationship between the
variables. If we do not keep this thing in mind we may draw meaningless results.
To sum up, a measure of correlation is only an expression of quantitative relationship
between two variables, say x and y. It does not indicate any casual relation between x and y.
Relation between the two variables, if any, is established independently, of the correlation
coefficient which can only indicate the strength of the relationship.

11.1.2 Significance
Coefficient of correlation is a measure of degree of association between two variables. It reveals
two things about the possible association.
First, what is the direction of association? If the value is positive, there is direct association, as is
usually the case in case of association between price of good and the supply of that good. If the value
is negative, there is inverse relation as is usually the case in case of association between price of a
good
and the market demand for that good.
Second, the absolute value of the coefficient indicates how strong is the relationship. The value
varies between zero and one. Zero means no association while one means 100 percent association.
Nearer is the value to one, stronger is the relationship between the two variables.
The correlation coefficient is of great significance to goverment in policy formulation. For
example, the measure of association between money supply and inflation rate is of great help in
controlling inflation.
It is also of great signifance to businessmen. Measure of association between price and demand,
between weather and demand, between age groups and demand, etc can be of great help in planning
their production activities.

11.3 TYPES AND KINDS OF CORRELATION


11.2.1 Introduction
There are many types/kinds of correlation. Some of these are : (a) positive and negative
correlation; (b) simple, partial and multiple correlation and (c) linear and non-linear correlation.
11.2.2 Simple, Partial and Multiple Correlation
When we study the relation between only two variables, it is termed as simple correlation, say
between price and demand. A study of relation between more than two variables is a study of
multiple correlation, say between price, income and demand. In the case of partial correlation, we
examine the degree of association between two variables but at the same time take into consideration
the fact that other variables are also operative. For example, we examine the relation between price
and demand but at the same time recognise that demand may also have been influenced by income
or other factors. The study of correlation in this chapter is confined to only simple correlation.
11.2.3 Positive and Negative Correlation
When both variables move in the same direction, it is called positive correlation. For example,
when demand rises, price rises is positive correlation. When both variables move in the opposite
direction it is called negative correlation. The inverse relation between price and demand is an
example of negative correlation.
11.2.4 Linear and Non-linear Correlation
When both the variables change in the same proportion, it is called linear correlation. For
example, consider the following:

X 1 2 3 4
Y 2 4 6 8
There is a linear correlation between X and Y in the above case. When the two variables change
in different ratios it is a case of non-linear correlation.

11.4 COMPUTATION OF CORRELATION


11.3.1 Introduction
There are many methods of study of correlation. Out of these we shall study two methods. These
are (i) Scatter diagram and (ii) Karl Pearson’s product moment method.

11.3.2 Scatter Diagram


It is a graphic method and gives only an indication of correlation, if any, between two variables.
We cannot calculate exact degree of relationship.
In this method we use graph. Suppose x and y are the two variables. We show x variable on x-
axis and y variable on y-axis. Each pair of value of x and y is shown as a dot (•) on the graph. After
plotting all the pairs of x and y, we study the scatter of these dots and draw broad conclusions about
the extent and direction of correlation. We draw conclusions by observing the direction of dots on a
scatter diagram. Movement of dots from left to right upwards indicates positive correlation (Figure
11.1). Movement from left to right downwards indicates negative correlation (Figure 11.2). If all the
dots lie in single straight line, it indicates perfect correlation between two variables (Figure 11.1 and
and 11.2). If there is no clear-cut movement either upwards or downwards, the correlation between
two variables is either absent or the degree is very low (Figure 11.3).
We cannot measure exact degree of correlation on a scatter diagram. For an exact measure we
have to resort to a statistical method. There are many statistical methods. Among these Karl
Pearson’s method is widely used. We now explain this method.
11.3.2 Karl Pearson’s Method of Calculating Degree of Correlation: Two Variables
(Ungrouped Data)
Basic method
Karl Pearson’s method is also known as Product - Moment method. The basic method is as
follows: γ=
∑ xy ………(i)
N σxσ y

where, γ = correlation coefficient


x=X-X
y = Y- Y
N = Number of observations
σx = Standard deviation of series X
σy = Standard deviation of series Y
In the above method, deviations are taken from actual mean. It makes the calculation somewhat
difficult. To find y, we have to first find out X , Y , σx. and σy . There is another variation of the above
formula by using which we can avoid direct calculation of σx and σy . It is as follows :
where, x = X - X , and y = Y - Y ,
There is another variation of the above method in which deviations are taken from the assumed
mean. This method avoids calculation of actual mean.

11.3.3 Direct Method


There is still another variation of the Karl Pearson’s method which avoids not only direct
computation of X and Y and standard deviation but also there is no need of taking deviations from an
assumed mean. The formula is :

N ∑ XY −( ∑ X )( ∑ Y )
γ= ……(iii)
√ N ∑ X 2− ( ∑ X )
2
√ N ∑ Y 2− ( ∑ Y )
2

The above method is called direct method of calculating coefficient of correlation. It is called
direct because it does not require computation of either mean or standard deviation.
Method (i) and (ii) above are beyond the scope of this chapter. We will study only method (iii)
i.e., direct method. We give below an example of the method.
Example.
Calculate the coefficient of correlation between price of a good and its demand by direct method.
Price (₹) 7 6 5 4 3 2 1
Demand (units) 10 12 15 20 30 40 50

Solution :

Price Demand (Units)


(₹)
X X2 Y Y2 XY

7 49 10 100 70
6 36 12 144 72
5 25 15 225 75
4 16 20 400 80
3 9 30 900 90
2 4 40 1600 80
1 1 50 2500 50
28 140 177 5869 517
N ∑ XY −( ∑ X )( ∑ Y )
γ=
√ N ∑ X −( ∑ X ) √ N ∑ Y −( ∑ Y )
2 2 2 2

( 7 × 517 )−( 28 ×177 )


γ=
√ ( 7 ×140 ) −( 28 )2 √( 7 ×5869 )−( 177 )2
3619−4956
γ=
√ 980−784 √ 41083−31329
−1337 −1337
γ= =
√ 196 × √ 9754 √ 196 ×9754
−1337 −1337
¿ = =−0.97 .
√1911784 1382.67
The coefficient indicates high degree of correlation between price and demand. The minus sign
indicates negative correlation, i.e., when price falls demand rises. In this example price is the cause
and demand is the effect. This cause and effect relationship is established independent of the
coefficient of correlation. The value of γ = - 0.97, only indicates correlation between price and
demand. It does not in itself indicate that price is the cause and demand is the effect.
Let us take another example.
Example.
Calculate coefficient of correlation between demand for a good and its price.
Demand 10 15 25 40 50
Price 1 2 3 4 6

Solution :
Demand Price
X X2 Y Y2 XY
10 100 1 1 10
15 225 2 4 30
25 625 3 9 75
40 1600 4 16 160
50 2500 6 36 300
140 5050 16 66 575

N ∑ XY −( ∑ X )( ∑ Y )
γ=
√ N ∑ X −( ∑ X ) √ N ∑ Y −( ∑ Y )
2 2 2 2

( 5 ×575 )−( 140 ×16 )


γ=
√ ( 5× 5050 )− (140 ) √ ( 5 ×66 )−( 16 )
2 2

2875−2240
γ=
√ ( 25250−19600 ) √ ( 330−256 )
635 635
γ= =
√ 5650× √ 74 √5650 × 74
635 635
¿ = =+0.98 .
√ 418100 646.6

In the above example γ = + 0.98. It shows that there is a very high degree of positive correlation
between demand for a good and its price. In this example demand is a cause and price is the effect.
11.3.5 Value of Coefficient of Correlation (γ )
The value of γ varies between +1 and -1. If γ = +1, it implies perfect positive correlation. If γ = -
1, it implies perfect negative correlation. If γ = 0, it implies absence of correlation. Nearer is the
value of γ to +1, higher is the degree of positive correlation. Nearer is the value of γ to -1, higher is
the degree of negative correlation.
In case of negative correlation, when we compare the values of coefficients we ignore the minus
sign. We compare the values in absolute terms. For example, if γ 1 = -0.6 and γ 2 = - 0.8, then γ 2 is
taken as greater than γ 1 (while mathematically γ 1 is greater than γ 7). This point must be kept in mind
in comparing the values of γ .

11.4 SOLVED EXAMPLES


Example 1. Calculate coefficient of correlation between marks in Mathematics and marks in
Economics obtained by 10 students.
S.No. Marks in Mathematics Marks in Economics
1 20 18
2 5 10
3 15 15
4 12 11
5 18 16
6 4 8
7 8 9
8 14 15
9 9 10
10 0 5
Solution :
S.No. Marks in Marks in
Mathematics Economics
X X2 Y Y2 XY
1 20 400 18 324 360
2 5 25 10 100 50
3 15 225 15 225 225
4 12 144 11 121 132
5 18 324 16 256 288
6 4 16 8 64 32
7 8 64 9 81 72
8 14 196 15 225 210
9 9 81 10 100 90
10 0 0 5 25 0
105 1475 117 1521 1459

N ∑ XY −( ∑ X )( ∑ Y )
γ=
√ N ∑ X −( ∑ X ) √ N ∑ Y −( ∑ Y )
2 2 2 2

( 10 ×1459 )−( 105 ×117 )


γ=
√ ( 10× 1475 )− (105 ) √ ( 10 ×1521 )−( 117 )
2 2

14590−12285
γ=
√ ( 14750−11025 ) √( 15210−13689 )
2305 2305
γ= =
√ 3725× √ 1521 √ 3725 ×1521
2305 2305
¿ = =+ 0.97 .
√5665725 2380.27
Example 2.
Calculate correlation coefficient between X series and Y series.
X 10 8 5 11 7 4 2
Y 5 9 4 14 0 5 3

Solution :

X X2 Y Y2 XY
10 100 5 25 50
8 64 9 81 72
5 25 4 16 20
11 121 14 196 154
7 49 0 0 0
4 16 5 25 20
2 4 3 9 6
47 379 40 352 322

N ∑ XY −( ∑ X )( ∑ Y )
γ=
√ N ∑ X −( ∑ X ) √ N ∑ Y −( ∑ Y )
2 2 2 2

( 7 × 322 )−( 47× 40 )


γ=
√ ( 7 ×379 ) −( 47 ) √ ( 7 ×352 )−( 40 )
2 2

2254−1880
γ=
√ ( 2653−2209 ) √ ( 2464−1600 )
374 374
γ= =
√ 444 √864 √383616
374
¿ =+0.60 .
619.37

Example 3.
Calculate the coefficient of correlation between the two series :
X 1 2 3 4 5 6
Y 2 4 6 8 10 12
Solution :
X X2 Y Y2 XY
1 1 2 4 2
2 4 4 16 8
3 9 6 36 18
4 16 8 64 32
5 25 10 100 50
6 36 12 144 72
21 91 42 364 182
N ∑ XY −( ∑ X )( ∑ Y )
γ=
√ N ∑ X −( ∑ X ) √ N ∑ Y −( ∑ Y )
2 2 2 2

( 6 ×182 )− (21 × 42 )
γ=
√ ( 6 ×91 ) −( 21 )2 √ ( 6 ×364 )−( 42 )2
1092−882
γ=
√ ( 546−441 ) √( 2184−1764 )
210 210 210
γ= = =
√ 105× √ 420 √ 105× 420 √ 44100
210
¿ =+1
210

Example 4.
Calculate the coefficient of correlation between X and Y.

X 1 2 3 4 5
Y 3 3 3 3 3
Solution :

X X2 Y Y2 XY
1 1 3 9 3
2 4 3 9 6
3 9 3 9 9
4 16 3 9 12
5 25 3 9 15

15 55 15 45 45
N ∑ XY −( ∑ X )( ∑ Y )
γ=
√ N ∑ X −( ∑ X ) √ N ∑ Y −( ∑ Y )
2 2 2 2

( 5 × 45 )− (15 × 15 )
γ=
√ ( 5× 55 )− (15 ) √ ( 5 ×45 )−( 15 )
2 2

225−225
γ=
√ ( 275−225 ) √ ( 225−225 )
0 0
γ= = =0
√ 50× √ 0 0
11.5 RANK CORRELATION
11.5.1 Introduction
The measure of rank correlation was developed by Charles Edward Spearman. It is an
alternative to the Karl Person’s product-moment method. There are three main uses of this measure :
1. Its computation is quicker, especially when the number of items is small.
2. It is useful for series of data that are given as ranks, scores, standings, etc.
3. It permits us to correlate two sets of qualitative observations which are subject to ranking.

11.5.2 Method
Suppose we are given two series of data with variables X and Y. Let these two variables be the
marks obtained by a group of 10 students in English (X) and Hindi (Y)
Marks obtained by students
S.No. Marks in English Marks in Hindi
(X) Ranks (Y) Ranks
(1) (2) (3) (4) (5)
1 80 1 60 4
2 75 2 75 1
3 66 3 40 7
4 60 4 36 8
5 10 5 20 3
6 30 6 50 2
7 54 7 65 6
8 38 8 30 9
9 50 9 74 5
10 43 10 43 10
To calculate rank correlation from the above data the following steps are required to be taken.
Steps
1. Arrange both the series in ascending or in descending order of value in the original data.
2. Suppose we arrange the series in descending order. First write the X series in the descending
order (Column 2) and also write ranks (Column 3). Then write the corresponding rank as it
appears in the Y-series against each ranked item of X-series (Column 5) In our illustration
student No. 1 gets first rank in English but 4th rank in Hindi. It makes one pair of rank, i.e., 1
and 4. Like this we write all the pairs of ranks. (Col. 3 and 5)
Note that in the above table no value appears more than once. Therefore, no rank. This is
“unrepeated rank case”.
What happens if value in a series appears more than once? If it is so method of ranking
changes. Let us take an example. Refer to the following:
S.No. Marks in English Marks in Hindi
Marks (X) Rank (X) Marks (Y) Rank (Y)
(1) (2) (3) (4) (5)
1 20 1 5 (10) 10
2 18 2 10 (5) 5.5
3 15 3 15 (2) 2.5
4 14 4 9 (7) 7
5 12 5 16 (1) 1
6 9 6 H (4) 4
7 8 7 15 (3) 2.5
8 5 8 10 (6) 5.5
9 4 9 6 (9) 9
10 0 10 8 (8) 8
Steps
1. Arrange x-series in descending order. It is already so in the above table. (Col. 2). There are no
repeated values in this series.
2. Assign ranks to series X (Col. 3)
3. Assign ranks to series Y according to serial number igonoring repeated values. It is done in
brackets. This is incorrect ranking. Why should two same values get different ranks? To correct
ranking take the 4th step.
4. Serial Nos. 3 and 7 in series Y have the same value i.e., 15. Find the average of the
incorrect ranks 2 and 3 which equals 2.5 ( 2+32 ). Assign this rank to both serial nos. 3 and 7. In the
same manner assign the rank 5.5 ¿ ( 2+3
2 )
to serial number 2 and
8. The rest of the serial numbers in series Y will hold the ranks as given in brackets.
The above is “repeated ranks case”.
The method of calculation of rank correlation differs in “repeated ranks case” from “unrepeated
ranks case” The two methods are the following:
Unrepeated Ranks Case
Apply the following formula
6∑ D
2
p=1−
N ( N 2 −1 )
where, p = Correlation symbol (pronounced as the greek letter “rho”)
D = Difference between each pair of ranks
N = Number of paired items.
Computation of Rank Correlation
S.No. Marks Obtained Rank Obtained D D2
English Hindi English Hindi
(X) (Y) (X) (Y) (X-Y) (X - Y)2
1 2 3 4 5 6 7
1 80 60 1 4 -3 9
2 75 75 2 1 1 1
3 66 40 3 7 -4 16
4 60 36 4 8 -4 16
5 54 65 5 3 2 4
6 50 74 6 2 4 16
7 43 43 7 6 1 1
8 38 30 8 9 -1 1
9 30 50 9 5 4 16
10 10 20 10 10 0 0
N = 10 SD2 = 80
6∑ D
2
6 × 80 480
p=1− =1− =1− =1−0.48=+0.52
N ( N −1 )
2
10 (100−1 ) 990
(b) Repeated Ranks Case
Apply the following formula:

p=1−
6 (∑ D + 121 (m −m )+ 121 (m −m) ........ )
2 3 3

N ( N 2 −1 )

where m = number of times a rank is repeated.


1
In the above formula 12 ( m −m )is the correction factor. There are as many 12
3

correction factors as there are cases of repeated ranks. In our example given below there are
two cases of repeated ranks. So there are two correction factors as shown in the following
calculation.
Computation of Rank Correlation
S.No. Marks Obtained Rank Obtained D D2
Economics Commerce Economics Commerce
(X) (Y) (X) (Y) (X - Y) (X - Y)2
I 2 3 4 5 6 7
1 20 5 1 10.0 -9.0 81.00
2 18 10 2 6.5 -4.5 20.25
3 15 15 3 3.5 -0.5 0.25
4 14 9 4 8.0 -4.0 16.00
5 12 8 5 9.0 -4.0 16.00
6 9 16 6 2.0 4.0 16.00
7 8 11 7 5.0 2.0 4.00
8 5 15 8 3.5 4.5 20.25
9 4 10 9 6.5 2.5 6.25
10 0 18 10 1.0 9.0 81.00
N = 10 ∑ D2=261.00
Substituting values, we get:

p=1−
[
6 261−
1 3
12 12 ]
( 2 −2 ) + 1 ( 23−2 )
10 (100−1 )

¿ 1−
[ 1
6 261+ ( 6 )+ ( 6 )
12
1
12 ]
10 ×99
6 ( 261+0.5+ 0.5 ) 6 ×262 1572
¿ 1− =1= =1− =−0.587
990 990 990

11.5.3 Interpretation of the Value of p


The value of p varies from + 1 to - 1.
1. If each X and its paired Y have exactly the same rank we have perfect positive correlation with
a coefficient of +1.
2. If the ranks are such that highest ranking X goes with the lowest ranking Y and so on, we have
perfect negative correlation with coefficient of -1.
3. In between these two extremes, the values of p may vary between +1 and -1.
11.5.4 Comparison with Karl Pearson’s Method
Rank correlation gives less importance to the extreme values because it gives them rank. In
comparison Karl Pearson’s product-moment correlation gives more importance to extreme values
because it is based on actual values.
Rank correlation is an alternative to product-moment method. But it is only a rough measure, and
therefore, should not be used indiscriminately.
11.5.5 Solved Examples
Example 1
Calculate rank correlation coefficient of Example 1 in Section 11.4.
Solution :

S.No. Marks Obtained Rank Obtained D D2


Maths Economics Maths Economics
(X) (Y) (X) (Y) (X-Y) (X - Y)2
1 2 3 4 5 6 7
1 20 18 1 1 0 0
2 18 16 2 2 0 0
3 15 15 3 3.5 -0.5 0.25
4 14 15 4 3.5 0.5 0.25
5 12 11 5 5 0 0
6 9 10 6 6.5 -0.5 0.25
7 8 9 7 8 -1.0 1.00
8 5 10 8 6.5 1.5 2.25
9 4 8 9 9 0 0
10 0 5 10 10 0 0
N = 10 ∑D2 = 4

p=1−
6 (∑ D + 121 (m −m )+ 121 (m −m) ........)
2 3 3

N ( N 2 −1 )

p=1−
[ 1 1
6 4− ( 23−2 ) + ( 23−2 )
12 12 ]
10 ( 10 −1 )
2

¿ 1−
[ 1 1
6 4+ ( 6 )+ ( 6 )
12 12 ]
10× 99
6 ( 4 +0.5+ 0.5 ) 6 ×5 30
¿ 1− =1= =1− =1−0.03=−0.997
990 990 990
Example 2.
Calculate rank correlation coefficient from the data in example 2 in section 11.4
Solution:
2
S.NO. X Y X-rank Y-rank D D
1 2 3 1 2 -1 1
2 4 5 2 4.5 -2.5 6.25
3 5 4 3 3 0 0
4 7 0 4 1 3 9
5 8 9 5 6 -1 1
6 10 5 6 4.5 1.5 2.25
7 11 14 7 7 0 0
2
N=7 ∑ D = 19.5

p=1−
6 (∑ D + 121 (m −m )+ 121 (m −m) ........)
2 3 3

N ( N 2 −1 )

¿ 1−
[ 1
6 19.5− ( 23−2 )
12 ]
7 ( 7 −1 )
2

¿ 1−
[ 1
6 19.5+ ( 6 )
12 ]
10 ×99
6 ( 19.5+0.5 ) 6× 20 120
¿ 1− =1= =1− =1−0.36=0.64
336 336 336
Example 3.
Calculate rank correlation coefficient from data in Example 3 in Section 11.4.
Solution :

S.No. X Y X-rank Y-rank D D2


1 1 2 1 1 0 0
2 2 4 2 2 0 0
3 3 6 3 3 0 0
4 4 8 4 4 0 0
5 5 10 5 5 0 0
6 6 12 6 6 0 0
N=6 ∑D2 = 0

6∑ D
2
6×0 0
p=1− =1− =1− =1−0=1
N ( N −1 )
2
10 ( 36−1 ) 210
Example 4
Calculate rank correlation coefficient from data in Example 4 in Section 11.4.
Solution :
S.No. X Y X-rank Y-rank D D2
1 1 3 5 3 2 4
2 2 3 4 3 1 1
3 3 3' 3 3 0 0
4 4 3 2 3 -1 1
5 5 3 1 3 -2 4
N=5 ED2 = 18

p=1−
6 (∑ D + 121 (m −m )+ 121 (m −m) ........)
2 3 3

N ( N 2 −1 )

[ [ ]
]
1 3
6 18+ ( 5 −5 )
12
¿ 1−
5 ( 5 −1 )
2

¿ 1−
1
12 [
× (125−5 )
6 18+
]
5 ×24
6 ( 18+10 ) 168
¿ 1− =1− =1−1.04=0.4
120 130

POINTS TO REMEMBER
• A measure of correlation is an expression of quantitative relationship between two
variables. However, it does not indicate any cause and effect relationship between the
variables.
• Study of relation between two variables is called simple correlation. Study of relation
between more than two variables is called multiple correlation.
• When both the variables move in the same direction, it iscalled positive correlation.
When both the variables move in the opposite direction, it iscalled negative correlation.
• When both the variables change in the same proportion, itis called linear correlation.
When the two variables change in different ratios, it is called non-linear correlation.
• Scatter diagram is a graphic method which indicates correlation but does not calculate the
exact degree.
• The basic method of calculation of coefficient of correlation ( γ ) is the product moment
method given by Karl Pearson.

According to this method : (γ ) =


∑ xy
N σx σ y
• There are many variations of this method. But there is one method which does not require
calculation of mean and standard deviation. It also does not require assumed mean. The
method is :
N ∑ XY −( ∑ X )( ∑ Y )
γ=
√ 2

N ∑ X 2− ( ∑ X ) N ∑ Y 2− ( ∑ Y )
2

It is called direct method.


• The value of coefficient of correlation varies between +1 and -1. If γ = 1, it implies perfect
positive correlation. If γ = -1, it implies perfect negative correlation. If γ = 0, it implies
absence of correlation.
• Rank correlation was developed by Spearman.

6∑ D
2

• The method is p=1−


N ( N 2 −1 )

• This method is applied when no rank is repeated.


• In the repeated rank case the method applied is:

p=1−
[ 6 (∑ D +( m −m ) +( m −m ) ........ ) ]
2 3 3

N ( N −1 )
2

where m = number of times the rank is repeated.

EXERCISES
MULTIPLE CHOICE QUESTIONS [1 Mark]
(Answers at the end of exercises)
Choose the correct alternative in the following questions:
1. Spot the correct statement about simple correlation and partial correlation:
(a) Both study relation between only two variables.
(b) Both establish cause and effect relationship.
(c) Simple correlation takes into consideration other variables.
(d) Partial correlation confines to only two variables.
2. When X falls, Y also falls. There is perfect correlation between the two. The correlation
coefficient between the two is
(a) Zero (b) Infinity (c) +1 (d) -1
3. Correlation between two variables is
(a) Proof of relationship between two variables.
(b) Simply some pointer of relationship between two variables.
(c) Neither (a) nor (b)
(d) Either (a) or (b)

SHORT ANSWER QUESTIONS-I [3 Marks]


Answer the following questions in about 60 words.
1. Distinguish between positive and negative correlations.
2. Distinguish between simple and multiple correlations.
3. Distinguish
5. Calculate between linear and non-linear
rank correlation from data correlations.
given in Q. No. 1. (Ans. -1)
SHORT6. ANSWER
Calculate rank correlation from data given in Q. No. 2.
QUESTIONS-II (Ans. 0)
[4 Marks]
7. the
Answer Calculate
followingrank correlation
questions from 70
in about data given in Q. No. 3.
words. (Ans. 0.8)
8. Calculate
1. Does a measure rank correlationindicate
of correlation from data given
cause in Q. No.
and effect 4.
relationship? Explain. (Ans. 1.0)
2. Explain the importance
Calculate of of
coefficient scatter diagrambetween
correlation in the study of correlation.
Happiness Score and total number of medals
3. Explain the significance of correlation.
(Gold + Silver + Bronze) won by the top 10 winner countries in Rio (Brazil) Olympics 2016.

Country NUMERICAL
Happiness Score *QUESTIONS
(out of Total No. of medals
1. Calculate coefficient of correlationmax.
between
10) series X and Y. won
USA 7.119 119
X Great Britain
10 8 6
6.867 4 2 66
Y China 2 4 (Ans. - 1)
6
5.140 8 10 70
2. CalculateRussia
coefficient of correlation 5.716
between series A and B. 56
Germany 6.750 42
A 1 2 3 4 5 6 7
Japan 5.987 41 (Ans. 0)
B 4 4 4 4 4 4 4
France 6.575 40
3. Calculate coefficient of correlation between age of wives and age of husbands.
South Korea 5.984 21
Age of Italy
wives 20 215.948 19 25 24 28
Age of Australia
husbands 24 247.284 21 30 24 29 (Ans. + 0.81)
* Source : World Happiness Report (2015) : New York : Sustainable Development Solution Network.
4. Calculate coefficient of correlation between prices and quantity supplied at each price.
Calculate Coefficient of correlation between PPP Gross National Income (GNI) per capita and
total number
Price (₹) of medals
1 (Gold 2+ Silver 3+ Bronze)4 won by 510 winner countries in Rio (Brazil)
Olympics 2016. (Ans. + 0.97)
Supply (units) 10 12 14 20 25
Country PPP GNI * per capita (2013) Total No. of
(thousand dollars) medals won
USA 52.3 119
Great Britain 35.0 66
China 11.5 70
Russia 22.6 56
Germany 43.0 42
Japan 36.7 41
France 36.6 40
South Korea 30.3 21
Italy 32.7 28
Australia 41.5 29 (Ans. + 0.214)
* Source : Human Development Report 2014.

Answer to the Multiple Choice Questions


1. (a) 2. (c) 3. (b)
Study of index numbers is a vast field. We can talk of meaning, methods of construction, types,
importance, applications in decision-making and so on. The scope of this chapter is however limited
12.1 INTRODUCTION
to the meaning and certain important types of index numbers. Among the type we will explain (i)
wholesale price index, (ii) consumer price
index and (iii) index of industrial production.
12.2 MEANING AND SIGNIFICANCE OF INDEX NUMBERS
12.2.1 Meaning
An index number is basically an indicator. For example, a price index indicates change in price
levels of a single good or a group of goods over a time period. Similarly, there are production,
income, saving, expenditure, cost of living, population, employment, poverty, standard of living,
education, and so many other indices. Index numbers make possible the measurement of changes in a
variable from time to time and from place to place.
There is no unique definition of an index number. A definition of an index must point out the
major tasks the index number performs. Basically, it performs three tasks. First, it averages out the
changes in one or more variables. Second, it measures these changes over a time period and over
space. Third, it measures these changes with reference to a base which is normally a period, say a
year, a month, a week, etc. On the basis of these tasks which an index number performs, we can
define an index number as follows :
An index number is a specialized statistical average measuring changes in one or a group of
related variables with reference to time or place.

The index of the year 2011-12 is 100. The reference year, i.e., base year is 2011— 12, is always
taken as 100. Now, compare 2016-17 with 2015-16. The index of 2016- 17 shows that there is 3 -2%
increase in the wholesale prices during 2016-17. Remember, this 3 • 2% change is an average
change, and it does not indicate that price of every good or service has increased by 3.2%. In this
sense an index is a statistical average.
When we compare the index of one year with the index of base year, it tells us about the
percentage change between the base year and the year of comparison. For example, the index of
2016-2017 is 129. It indicates that between 2011-12 and 2016-2017, i.e., over the 5 year period, the
average wholesale prices of goods in India has increased by
Take, for example, the General Index of Wholesale Prices of India :
TABLE 12.1
General Index of Wholesale Prices
(Base : 2011-12 = 100)

Year Index
2011-12 100
2012-13 111
2013-2014 122
2014-2015 125

2015-2016 125

2016-2017 129
(Source : Economic Survey : 2016-2017) Government of India, Ministry of Finance

29%. Note that this entire increase is not during one year but over a period of 5 years. In this sense
an index number measures changes over a period of time.
However, for year to year comparison we have to make some more statistical calculations.
Suppose we want to find out the change in price level between 2015-16 and 2016-2017. The index of
the two years are 125 and 129 respectively. We can find out the change during 2010-2011 in the
following way.
129
-----x 100 = 103.2
125
The calculation shows that wholesale price level during the year 2016-2017 increased by 3-2%.
Also, note that it is a general price index. By general it is meant that it includes nearly all
commodities in its scope, like food articles, non-food articles, manufactured products, etc. In this
way an index number shows changes in a group of related variables.
12.2.2 Significance of Index Numbers
An index number measures relative changes over a period of time. There are many index
numbers : Wholesale Price Index (WPI), Consumer Price Index (CPI), Index of Industrial Production
(IIP), Index of Agricultural Production and so on. Each has its own significance. The significance of
WPI, CPI and IIP is explained in section 12.5, 12.6 and 12.7 respectively. We explain here some of
the general uses of index numbers.

You might also like