Professional Documents
Culture Documents
Season 4
Season 4
X=
∑ X = 781 =71
N 11
A.D. =
∑ lxl = 146 =13.3≈.
N 11
X=
∑∫ X = 6800 =68
N 100
Solved Examples
Example 1.
Calculate average deviation from the data of marks obtained by 10 students.
90, 64, 79, 33, 85, 59, 60, 70, 40, 95
Solution :
S.No. Marks Deviation from Mean
(X) x = (X - X )
1 90 22.5
2 64 3.5
3 79 11.5
4 33 34.5
5 85 17.5
6 59 8.5
7 60 7.5
8 70 2.5
9 40 27.5
10 95 27.5
A . D=
∑ lxl = 163 =16.3
N 10
Example 2.
Find out average deviation from the following items
8, 9, 12, 15, 17, 20, 24
Solution :
S.No. Size Deviation from mean
(X) (x) = X - X
1 8 7
2 9 6
3 12 3
4 15 0
5 17 2
6 20 5
7 24 9
∑X = 105 (∑x) = 32
105
X= =15
7
A.D. =
∑ lxl = 32 =4.57
N 7
Example 3.
Find average deviation from the following :
Wage (₹) 0-100 100-200 200-300 300-400 400-500
No. of workers 20 30 35 45 20
Solution :
Class Mid-point Frequency Frequency x Deviation Frequency x
from mean deviation
mid point fA ∑f lxl
X f Ixl
0 - 100 50 20 1000 210 4200
100 - 200 150 30 4500 110 3300
200 - 300 250 35 8750 10 350
300 - 400 350 45 15750 90 4050
400 - 500 450 20 9000 190 3800
N = 150 ∑fX = 39000 ∑f Ixl = 15700
Example 4.
Find average deviation from the following:
X=
∑ fX = 5100 =51 marks
N 100
A . D .=
∑ flxl = 1928 =19.28 marks
N 100
A.D. ¿
∑ d = 145 =13.15≈.
N 11
[ ]
N
−∑ f 1
Med = l1 + 2 × i=60+
50−30
×10
f med 50
20 ×10 200
¿ 60+ =60+ =64
50 50
A . D .=
∑ df = 1210 =12.1
N 100
10.5.6 Characteristics of Average Deviation (AD)
(1) AD is based on all values.
(2) AD is lower when calculated from median as compared to when calculated from
arithmetic mean.
(3) Since AD ignores signs of deviation, it is incapable of being calculated from open
ended series.
10.6 STANDARD DEVIATION
10.6.1 Introduction
Standard deviation (SD) is a special form of average deviation. It is different from average
deviation in two respects. First, in computing standard deviation the average used is always the
arithmetic mean. Second, the deviations from mean are squared. The second step is helpful in avoiding
the problem involved in disregarding signs. Standard deviation is considered as the most important
measure of variation.
10.6.2 Methods
There are two methods of calculating Standard deviation, the long method and the short method. In
the long method the deviations are taken from the actual mean. In the short method the deviations are
taken from the assumed mean. Both the methods are used for ungrouped data and grouped data. Let us
first explain the case of ungrouped data.
10.6.3 Standard Deviation (Ungrouped Data)
Long method
The main steps are:
1. Find out arithmetic mean ( X ).
2. Take deviations from the mean (x).
3. Take the square of deviations (x2).
4. Take the sum of squares (∑x2).
5. Divide the sum of squares (∑r2) by the number of items (N).
X=
∑ X = 781 =71 marks .
N 11
Shortcut method
σ=
√ √
∑ x2 =
N
2590
11
=√ 235.4545=15.34≈.
Let us use the same series of data as in Table 10.3 but prepare a fresh table of computation of
standard deviation by shortcut method. In this method, we take deviations from assumed mean instead
of actual mean. The basic method is the same. But since we are taking deviations from the assumed
mean, we also take the correction factor to eliminate the difference between the actual mean and the
assumed mean. The main steps in the method are the following :
1. Choose some value as assumed mean ( X d), preferrably one of the mid points.
2. Take deviations (x) from the assumed mean.
3. Take the square of deviations (x2)
4. Take the sum of squares of deviations (∑x2)
5. Take the sum of deviations from assumed mean (∑x)
9. Substract ∑x x2 x2 x
N N N N
Let us calculate standard deviation from Table 10.5
TABLE 10.5
Computation of Standard Deviation From an Array of Marks Obtained by Students
Item No. Marks Deviation from assumed Square of
mean (70) deviations
X x = X- X d X2
1 50 -20 400
2 51 -19 361
3 55 -15 225
4 59 -11 121
5 68 -2 4
6 70 0 0
7 75 +5 25
8 80 +10 100
9 86 +16 256
10 92 +22 484
11 95 +25 625
N = 11 ∑x = 11 ∑x2 = 2601
Assumed mean = 70
√ ( )
∑ x2 − ∑ x
√
2
( )
2
2601 11
σ= = −
N N 11 11
= 15.34 approx.
The value of standard deviation is the same by the short cut method. This method saves us from
computing X , i.e., actual arithmetic mean.
10.6.4 Standard Deviation (Grouped Data)
Introduction
The basic method of computing standard deviation from grouped data is the same. We have to take
an additional step. Since there are classes, we have to first find out the mid-point of each class. This
mid-point is then taken as the representative value of each class.
Like in case of ungrouped data, there are two methods of computing standard deviation from
grouped data : long method and short method. In the long method we take deviations from actual
arithmetic mean ( X ). In the short method we take deviations from the assumed mean ( X d). Let us now
study the two methods.
Long Method
Let us use the data in Table 10.4 as an illustration. The main steps are :
(1) Find out mid-point of each class.
(2) Calculate X , i.e., actual arithmetic mean.
(3) Find out deviation (x) from X .
(4) Take square of x. i.e., (x2).
(5) Multiply x2 by frequency (fx2)
(6) Apply the following formula:
σ=
√ ∑ f x2
N
The above steps are summed up in Table 10.6
TABLE 10.6
Computation of Standard Deviation by Long Method from Frequency Distribution of Marks
Obtained by Students
Class Frequency Mid-point Mid-point x Deviation Square of Square of x
(Marks) (Students) of class frequency from mean deviations x frequency
(68)
f X fX x=X-X X2
40-50 12 45 540 -23 529 6348
50-60 18 55 990 -13 169 3042
60-70 23 65 1495 -3 9 207
70-80 27 75 2025 +7 49 1323
80-90 15 85 1275 +17 289 4335
90-100 5 95 475 +27 729 3645
X=
∑ fX 6800 =68 marks
N 100
s=
√ ∑ f x2 =
N √ 18900
100
= √ 189=13.75 marks≈.
i.e., ( x = xi )
'
√ (
∑ f x' 2 − ∑ f x'
)
2
σ¿ i
N N
Where , i = class interval
( ∑ f x' 2
N )=t h e correction factor .
∑
( ) . The first part is the usual long method. The
2
√ (
∑ f x' 2 − ∑ f x'
) √
2
( )
2
σ¿ i 198 30
=10 −
N N 100 100
= 10 √ 1.98−( 0.3 ) 2
= 10 √ 1.98−0.09=10 √ 1.89
= 10 ×1.375 approx = 13.75 approx.
10.7 SOLVED EXAMPLES
Example 1.
Calculate standard deviation from the data on marks obtained by 10 students:
33, 40, 59, 60, 64, 70, 79, 85, 90, 95
Solution :
1 33 -37 1369
2 40 -30 900
3 59 -11 121
4 60 -10 100
5 64 -6 36
6 70 0 0
7 79 +9 81
8 85 +15 225
9 90 +20 400
10 95 +25 625
2
∑X = 675 ∑x = -25 ∑x = 3857
Assumed mean = 70
√ ∑ x2 − ∑ x
( ) √
2
( )
2
3857 −25
σ= = −
N N 10 10
¿ √ 385.7−6.25
¿ √ 379.45=19.48 marks≈.
Example 2.
Calculate standard deviation from the following items.
8, 9, 12, 15, 17, 20, 24
Solution :
S. No. Size (X) Deviation from mean Square of dev.
x=X-X (x2)
1 8 -7 49
2 9 -6 36
3 12 -3 9
4 15 0 0
5 17 +2 4
6 20 +5 25
7 24 +9 81
N=7 ∑X = 105 ∑x2 = 204
105
X= =15
7
σ=
√ N
∑ x 2 =¿
204
7
¿ √ 29.14=5.40≈.
¿
√
Example 3.
Find standard deviation from the following :
Wage (₹) 0-100 100-200 200-300 300-400 400-500
No. of workers 20 30 35 45 20
Solution :
Class Frequency Mid Mid point Deviation Square of Square of dev. x
point x frequency from mean x deviation frequency
f X fX x=X–X x2
fx2
0-100 20 50 1000 - 210 44100 882000
100-200 30 150 4500 - 110 12100 363000
200-300 35 250 8750 - 10 100 3500
300-400 45 350 15750 + 90 8100 364500
400-500 20 450 9000 + 190 36100 722000
X=
∑ fX = 39000 =260
N 150
σ=
√ ∑ x 2 =¿
N √ 2335000
150
¿
¿ √ 15566.67=124.77≈.
Example 4.
Calculate standard deviation by shortcut method.
Marks 0-20 20 - 40 40 – 60 60 - 80 80 - 100
No. of students 12 21 31 22 14
Solution :
Class Mid-point Frequency Deviation from Step. Step dev. x Square of Square of step
assumed mean dev. frequency step dev. dev.
20 - 40 30 21 - 20 -1 - 21 1 21
40 - 60 50 31 0 0 0 0 0
60 - 80 70 22 + 20 +1 + 22 1 22
80 - 100 90 14 + 40 +2 + 28 4 56
N = 100 ∑fx' = + 5 ∑fx'2 = 147
Assumed mean = 50
√ (
∑ f x' 2 − ∑ f x'
) √
2
σ¿ i
N N
= 20
147
−
5 2
100 100 ( )
=¿ 20 √ 1.47−.0025 ¿
Example 5.
Solve example 3 by the assumed mean method.
Solution :
Class Frequency Mid Deviation from Step. Step dev. Square of Square of step
point assumed mean dev. x frequency step dev. dev. x frequency
(f) (X) X x’ fx’ x'2 fx’2
0 – 100 20 50 -200 -2 - 40 4 80
100 – 200 30 150 -100 -1 - 30 1 30
200 – 300 35 250 0 0 0 0 0
300 – 400 45 350 + 100 +1 + 45 1 45
400 – 500 20 450 + 200 +2 + 40 4 80
N = 150 ∑fx' = +15 ∑fx'2 = 235
√ (
∑ f x' 2 − ∑ f x'
) √
2
( )
2
235 15
σ¿ i = 100 −
N N 150 150
¿ 100 √ 1.5667−( .1 )
2
Then,
10 1000
V of series A= × 100= =12.5 %
80 80
10 1000
V of series B= × 100= =25 %
40 40
It is clear from the above that variation is relatively much larger in series B than in series A. (Also
note that absolute dispersion is same, in both the series). Thus, the comparison between the two series
can be made only through the use of measures of relative dispersion.
Illustration 2
Now we take an illustration in which arithmetic means are the same but standard deviations differ.
Suppose, following is known about the daily wage of two groups of workers :
X (wage) σ (wage)
x group : 50 5
y group 50 10
σ 5
V of x group= × 100= × 100=10 %
X 50
10
V of y group= ×100=20 %
50
Therefore, variation is more in case y group than in case of x group, even though both the groups
have the same arithmetic mean.
10.9.2 Solved Examples
Example 1.
Find the coefficient of variation from data in Example 1 in section 10.7
Solution:
X=
∑ X = 675 =67 . 5
N 10
σ =19.48
σ 19.48
V= ×100= ×100=28.87 %
X Example67.52.
Find coefficient of variation from data in example 2 in section 10.7.
Solution :
X =15
σ =5.4
σ 5.4
V= ×100= ×100=36 %
X 15
Example 3.
Calculate coefficient of variation from data in example 3 in section 10-7.
Solution :
X =260
σ =124.77
124.77 12477
V= ×100= =47.99 %
260 260
Illustration 2.
Refer to the Table 10.2 in which the upper limit of the class of the highest value is 100 marks and
the lower limit of the class of the lowest value is 40 marks. Therefore,
100−40 60
V R= = =0.43≈.
100+ 40 140
Illustration 3.
Refer to the example 3 in section 10.4.5 in which the upper limit of class of the highet value is ?
500 and the lower limit of the class of the lowest value is 0. Therefore,
500−0 500
V R= = =1
500+0 500
10.10.3 Limitations
First, coefficient of Range is not considered dependable in distributions with extreme values.
Second, it is not possible to calculate coefficient of range in open ended distributions.
10.11 COEFFICIENT OF QUARTILE DEVIATION
10.11.1 Measure
It is defined as the ratio between the difference between the quartiles and the sum of quartiles.
Its formula is
Q3 −Q1
V Q=
Q 3+Q 1
This measure is considered more appropriate in open-end distributions and distribution with
extreme values.
Illustration 2.
Refer to illustration in section 10.4.4 in which the values of Q 1 and Q3 are 57.3 marks and 78.1
marks respectively. The value of VQ in this case is
Q3 −Q1 78.1−57.3
V Q= = =0.15(¿ 15 %)
Q 3+Q 1 78.1+57.3
Solution :
Q3 −Q1 20−9 11
V Q= = = =0.38(¿ 38 % )
Q 3+Q 1 20+ 9 29
Example 3.
On the basis of data in Example 3 of Section 9.5 of Chapter 9, calculate the value of coefficient
of Quartile Deviation.
Solution:
Q3 −Q1 359.375−154.170 205.205
V Q= = = =0.40(¿ 40 %)
Q 3+Q 1 359.375+154.170 513.545
Example 4.
On the basis of data in Example 4 of Section 9.5 of Chapter 9, calculate the coefficient of
Quartile Deviation.
Solution:
Q3 −Q1 68.89−34.12 34.77
V Q= = = =0.34 (¿ 34 %)
Q 3+Q 1 68.89+ 34.12 103.01
Example 5.
On the basis of data in Example 5 of Section 9.5 of coefficient of Quartile Deviation. Chapter 9,
calculate the
Solution:
Q3 −Q1 70.830−34.375 36.375
V Q= = = =0.35(¿ 35 %)
Q 3+Q 1 70.830+34.375 105.205
Refer to the illustration in section 10.5.5 where median = 64 and A.D. = 12.1. The value of VAD in
this case:
A . D . 12. 1
V A .D .= = =0 . 19(¿ 19 %)
Median 64
Solved Examples
Example 1.
On the basis of data in Example 1 of Section 10.5.5 in chapter 10, calculate the coefficient of
Mean Deviation.
A . D . 16.3
Solution : V A .D .= = =0.24 (¿ 24 %)
X 67.5
Example 2.
On the basis of data in Example 2 in Section 10.5.5 of Chapter 10, calculate coefficient of Mean
Deviation.
A . D . 4.57
Solution : V A .D .= = =0.30(¿30 % )
X 15
Example 3.
On the basis of data in Example 3 in Section 10.5.5 in chapter 10, calculate coefficient of Mean
Deviation.
A . D . 104.67
Solution : V A .D .= = =0.40(¿ 40 % )
X 260
Example 4.
On the basis of data in Example 4 of Section 10.5.5 in chapter 10, calculate coefficient of Mean
Deviation.
A . D . 19.28
Solution : V A .D .= = =0.38 (¿ 38 %)
X 51
LORENZ CURVE
10.13.1 Introduction
Lorenz Curve is a type of measure of dispersion, but more of a graphical measure. It shows how
a distribution deviates from equal distribution. The traditional techniques of dispersion are statistical
techniques. The Lorenz curve is different in the sense that it is a graphical technique.
10.13.2 Meaning
Lorenz Curve shows the extent of departure between an “equal distribution” and the “actual
distribution” of a variable. When applied to income distribution, it shows the extent of departure of
actual distribution of income from an equal distribution. It is known as the best summary measure of
inequality.
When each household gets the same income, the income distribution is said to be equal. But, in
the real world, the income distribution is never equal. Since the circumstances in which households
are placed differ, the income earned by them also differ. Limited inequality is understandable. Large
scale inequality, however, is undesirable. This is why governments take steps to reduce inequality.
To do so they must know the extent of departure from equal distribution. Lorenz Curve technique is
actively used for this purpose.
In the next step, represent the distribution graphically. Represent households on the x-axis and
income on the y-axis. Make the diagram a square box diagram. Join the lower left hand corner and
the upper right hand corner with a straight line.
Let the box be ABCD. The diagonal AC then joins the two comers. Now draw the actual
distributions which will lie somewhere between AC and ABC.
Interpretation
Note that there are four curves described as follows :
AC : Represents equal distribution, i.e. perfect equality.
ABC : Represents perfect inequality with all theincome with last group.
AEC : Actual distribution for the year 2015.
AFC : Actual distribution for the year 2004.
The curves AC and ABC are simply two ‘never to exist’ extremes, AC shows perfect equality,
while ABC shows perfect inequality. Neither of the two situations is possible in the real world. The
actual distribution must lie somewhere between AC and ABC. Of the two, AC curve is the reference
curve. More is the actual distribution curve away from the equal distribution curve, higher the
degree of inequality.
10.13.4 The Gini Coefficient of Inequality
The Lorenz curve is useful only in ranking inequality between different nations and over
different time periods within the same nation. However, the curve cannot be used to measure the
precise degree of inequality. We can do it with Gini coefficient of inequality. In terms of Lorenz
diagram, it is the numerical value of the area between the Lorenz curve and the diagonal line of
equal distribution, divided by the entire area beneath the diagonal line.
Inequality area
Gini coefficient =
Triangle area
With reference to the year 2015:
Area enclosed by AEC
Gini coefficient =
Area enclosed by ABC
The value of coefficient may vary 0 to 1. In case of equal distribution, it will be zero. In case of
perfect inequality it is 1. In case of actual distribution it will lie between 0 and 1.
POINTS TO REMEMBER
• Dispersion means distance from the average.
• Dispersion is of two types : absolute and relative dispersion.
• Main measures of absolute dispersion (i) range, (ii) semi-inter quartile range (quartile
deviation), (iii) average deviation and (iv) standard deviation
• Main measures of relative dispersion are:
(i) Coefficient of quartile deviation
(ii) Coefficient of average deviation
(iii) Coefficient of variation
• Range in the ungrouped data is the difference between the highest and the lowest values in
the series.
• Range, in grouped data, is the difference between the upper limit of the class of the highest
value and the lower limit of the class of the lowest value.
• Semi-inter quartile range (quartile deviation) equals the difference between third and first
quartile divided by 2.
• The two characteristics of semi-inter quartile range are :
(i) Not affected by extreme values
(ii) Possible to find value in case of open end distribution
• Average (mean) deviation (AD) is the average distance (ignoring pluses and minuses) of the
items of a series from their average (arithmetic mean or median). The method of calculation
of AD is the same whether calculated from arithmetic mean or median. The answers may be
different.
√ ( )
∑ x2 − ∑ x
2
√ ( )
' 2
SD in grouped data by shortcut method = ∑ x , 2 − ∑ fx
N N
There is need for a measure of relative dispersion on account of two limitations of measures
of absolute dispersion : (i) The measures of absolute dispersion are not comparable because
they may be expressed in different units and (ii) a measure of absolute dispersion s not
expressed relative to the magnitude around which the dispersion is measured.
Coefficient of variation (V) is the most often used measure of relative dispersion.
Standard deviation s
V= ×100= ×100
Arithmetic mean X
deviation.
Variable Frequency
0-5
2
5-10 5
10 – 15 7
15 – 20 13
20 – 25 2
25 – 30 16
30 – 35 8
35 - 40 3 (Ans. σ = 9.25, V = 42.63%)
11.1.2 Significance
Coefficient of correlation is a measure of degree of association between two variables. It reveals
two things about the possible association.
First, what is the direction of association? If the value is positive, there is direct association, as is
usually the case in case of association between price of good and the supply of that good. If the value
is negative, there is inverse relation as is usually the case in case of association between price of a
good
and the market demand for that good.
Second, the absolute value of the coefficient indicates how strong is the relationship. The value
varies between zero and one. Zero means no association while one means 100 percent association.
Nearer is the value to one, stronger is the relationship between the two variables.
The correlation coefficient is of great significance to goverment in policy formulation. For
example, the measure of association between money supply and inflation rate is of great help in
controlling inflation.
It is also of great signifance to businessmen. Measure of association between price and demand,
between weather and demand, between age groups and demand, etc can be of great help in planning
their production activities.
X 1 2 3 4
Y 2 4 6 8
There is a linear correlation between X and Y in the above case. When the two variables change
in different ratios it is a case of non-linear correlation.
N ∑ XY −( ∑ X )( ∑ Y )
γ= ……(iii)
√ N ∑ X 2− ( ∑ X )
2
√ N ∑ Y 2− ( ∑ Y )
2
The above method is called direct method of calculating coefficient of correlation. It is called
direct because it does not require computation of either mean or standard deviation.
Method (i) and (ii) above are beyond the scope of this chapter. We will study only method (iii)
i.e., direct method. We give below an example of the method.
Example.
Calculate the coefficient of correlation between price of a good and its demand by direct method.
Price (₹) 7 6 5 4 3 2 1
Demand (units) 10 12 15 20 30 40 50
Solution :
7 49 10 100 70
6 36 12 144 72
5 25 15 225 75
4 16 20 400 80
3 9 30 900 90
2 4 40 1600 80
1 1 50 2500 50
28 140 177 5869 517
N ∑ XY −( ∑ X )( ∑ Y )
γ=
√ N ∑ X −( ∑ X ) √ N ∑ Y −( ∑ Y )
2 2 2 2
Solution :
Demand Price
X X2 Y Y2 XY
10 100 1 1 10
15 225 2 4 30
25 625 3 9 75
40 1600 4 16 160
50 2500 6 36 300
140 5050 16 66 575
N ∑ XY −( ∑ X )( ∑ Y )
γ=
√ N ∑ X −( ∑ X ) √ N ∑ Y −( ∑ Y )
2 2 2 2
2875−2240
γ=
√ ( 25250−19600 ) √ ( 330−256 )
635 635
γ= =
√ 5650× √ 74 √5650 × 74
635 635
¿ = =+0.98 .
√ 418100 646.6
In the above example γ = + 0.98. It shows that there is a very high degree of positive correlation
between demand for a good and its price. In this example demand is a cause and price is the effect.
11.3.5 Value of Coefficient of Correlation (γ )
The value of γ varies between +1 and -1. If γ = +1, it implies perfect positive correlation. If γ = -
1, it implies perfect negative correlation. If γ = 0, it implies absence of correlation. Nearer is the
value of γ to +1, higher is the degree of positive correlation. Nearer is the value of γ to -1, higher is
the degree of negative correlation.
In case of negative correlation, when we compare the values of coefficients we ignore the minus
sign. We compare the values in absolute terms. For example, if γ 1 = -0.6 and γ 2 = - 0.8, then γ 2 is
taken as greater than γ 1 (while mathematically γ 1 is greater than γ 7). This point must be kept in mind
in comparing the values of γ .
N ∑ XY −( ∑ X )( ∑ Y )
γ=
√ N ∑ X −( ∑ X ) √ N ∑ Y −( ∑ Y )
2 2 2 2
14590−12285
γ=
√ ( 14750−11025 ) √( 15210−13689 )
2305 2305
γ= =
√ 3725× √ 1521 √ 3725 ×1521
2305 2305
¿ = =+ 0.97 .
√5665725 2380.27
Example 2.
Calculate correlation coefficient between X series and Y series.
X 10 8 5 11 7 4 2
Y 5 9 4 14 0 5 3
Solution :
X X2 Y Y2 XY
10 100 5 25 50
8 64 9 81 72
5 25 4 16 20
11 121 14 196 154
7 49 0 0 0
4 16 5 25 20
2 4 3 9 6
47 379 40 352 322
N ∑ XY −( ∑ X )( ∑ Y )
γ=
√ N ∑ X −( ∑ X ) √ N ∑ Y −( ∑ Y )
2 2 2 2
2254−1880
γ=
√ ( 2653−2209 ) √ ( 2464−1600 )
374 374
γ= =
√ 444 √864 √383616
374
¿ =+0.60 .
619.37
Example 3.
Calculate the coefficient of correlation between the two series :
X 1 2 3 4 5 6
Y 2 4 6 8 10 12
Solution :
X X2 Y Y2 XY
1 1 2 4 2
2 4 4 16 8
3 9 6 36 18
4 16 8 64 32
5 25 10 100 50
6 36 12 144 72
21 91 42 364 182
N ∑ XY −( ∑ X )( ∑ Y )
γ=
√ N ∑ X −( ∑ X ) √ N ∑ Y −( ∑ Y )
2 2 2 2
( 6 ×182 )− (21 × 42 )
γ=
√ ( 6 ×91 ) −( 21 )2 √ ( 6 ×364 )−( 42 )2
1092−882
γ=
√ ( 546−441 ) √( 2184−1764 )
210 210 210
γ= = =
√ 105× √ 420 √ 105× 420 √ 44100
210
¿ =+1
210
Example 4.
Calculate the coefficient of correlation between X and Y.
X 1 2 3 4 5
Y 3 3 3 3 3
Solution :
X X2 Y Y2 XY
1 1 3 9 3
2 4 3 9 6
3 9 3 9 9
4 16 3 9 12
5 25 3 9 15
15 55 15 45 45
N ∑ XY −( ∑ X )( ∑ Y )
γ=
√ N ∑ X −( ∑ X ) √ N ∑ Y −( ∑ Y )
2 2 2 2
( 5 × 45 )− (15 × 15 )
γ=
√ ( 5× 55 )− (15 ) √ ( 5 ×45 )−( 15 )
2 2
225−225
γ=
√ ( 275−225 ) √ ( 225−225 )
0 0
γ= = =0
√ 50× √ 0 0
11.5 RANK CORRELATION
11.5.1 Introduction
The measure of rank correlation was developed by Charles Edward Spearman. It is an
alternative to the Karl Person’s product-moment method. There are three main uses of this measure :
1. Its computation is quicker, especially when the number of items is small.
2. It is useful for series of data that are given as ranks, scores, standings, etc.
3. It permits us to correlate two sets of qualitative observations which are subject to ranking.
11.5.2 Method
Suppose we are given two series of data with variables X and Y. Let these two variables be the
marks obtained by a group of 10 students in English (X) and Hindi (Y)
Marks obtained by students
S.No. Marks in English Marks in Hindi
(X) Ranks (Y) Ranks
(1) (2) (3) (4) (5)
1 80 1 60 4
2 75 2 75 1
3 66 3 40 7
4 60 4 36 8
5 10 5 20 3
6 30 6 50 2
7 54 7 65 6
8 38 8 30 9
9 50 9 74 5
10 43 10 43 10
To calculate rank correlation from the above data the following steps are required to be taken.
Steps
1. Arrange both the series in ascending or in descending order of value in the original data.
2. Suppose we arrange the series in descending order. First write the X series in the descending
order (Column 2) and also write ranks (Column 3). Then write the corresponding rank as it
appears in the Y-series against each ranked item of X-series (Column 5) In our illustration
student No. 1 gets first rank in English but 4th rank in Hindi. It makes one pair of rank, i.e., 1
and 4. Like this we write all the pairs of ranks. (Col. 3 and 5)
Note that in the above table no value appears more than once. Therefore, no rank. This is
“unrepeated rank case”.
What happens if value in a series appears more than once? If it is so method of ranking
changes. Let us take an example. Refer to the following:
S.No. Marks in English Marks in Hindi
Marks (X) Rank (X) Marks (Y) Rank (Y)
(1) (2) (3) (4) (5)
1 20 1 5 (10) 10
2 18 2 10 (5) 5.5
3 15 3 15 (2) 2.5
4 14 4 9 (7) 7
5 12 5 16 (1) 1
6 9 6 H (4) 4
7 8 7 15 (3) 2.5
8 5 8 10 (6) 5.5
9 4 9 6 (9) 9
10 0 10 8 (8) 8
Steps
1. Arrange x-series in descending order. It is already so in the above table. (Col. 2). There are no
repeated values in this series.
2. Assign ranks to series X (Col. 3)
3. Assign ranks to series Y according to serial number igonoring repeated values. It is done in
brackets. This is incorrect ranking. Why should two same values get different ranks? To correct
ranking take the 4th step.
4. Serial Nos. 3 and 7 in series Y have the same value i.e., 15. Find the average of the
incorrect ranks 2 and 3 which equals 2.5 ( 2+32 ). Assign this rank to both serial nos. 3 and 7. In the
same manner assign the rank 5.5 ¿ ( 2+3
2 )
to serial number 2 and
8. The rest of the serial numbers in series Y will hold the ranks as given in brackets.
The above is “repeated ranks case”.
The method of calculation of rank correlation differs in “repeated ranks case” from “unrepeated
ranks case” The two methods are the following:
Unrepeated Ranks Case
Apply the following formula
6∑ D
2
p=1−
N ( N 2 −1 )
where, p = Correlation symbol (pronounced as the greek letter “rho”)
D = Difference between each pair of ranks
N = Number of paired items.
Computation of Rank Correlation
S.No. Marks Obtained Rank Obtained D D2
English Hindi English Hindi
(X) (Y) (X) (Y) (X-Y) (X - Y)2
1 2 3 4 5 6 7
1 80 60 1 4 -3 9
2 75 75 2 1 1 1
3 66 40 3 7 -4 16
4 60 36 4 8 -4 16
5 54 65 5 3 2 4
6 50 74 6 2 4 16
7 43 43 7 6 1 1
8 38 30 8 9 -1 1
9 30 50 9 5 4 16
10 10 20 10 10 0 0
N = 10 SD2 = 80
6∑ D
2
6 × 80 480
p=1− =1− =1− =1−0.48=+0.52
N ( N −1 )
2
10 (100−1 ) 990
(b) Repeated Ranks Case
Apply the following formula:
p=1−
6 (∑ D + 121 (m −m )+ 121 (m −m) ........ )
2 3 3
N ( N 2 −1 )
correction factors as there are cases of repeated ranks. In our example given below there are
two cases of repeated ranks. So there are two correction factors as shown in the following
calculation.
Computation of Rank Correlation
S.No. Marks Obtained Rank Obtained D D2
Economics Commerce Economics Commerce
(X) (Y) (X) (Y) (X - Y) (X - Y)2
I 2 3 4 5 6 7
1 20 5 1 10.0 -9.0 81.00
2 18 10 2 6.5 -4.5 20.25
3 15 15 3 3.5 -0.5 0.25
4 14 9 4 8.0 -4.0 16.00
5 12 8 5 9.0 -4.0 16.00
6 9 16 6 2.0 4.0 16.00
7 8 11 7 5.0 2.0 4.00
8 5 15 8 3.5 4.5 20.25
9 4 10 9 6.5 2.5 6.25
10 0 18 10 1.0 9.0 81.00
N = 10 ∑ D2=261.00
Substituting values, we get:
p=1−
[
6 261−
1 3
12 12 ]
( 2 −2 ) + 1 ( 23−2 )
10 (100−1 )
¿ 1−
[ 1
6 261+ ( 6 )+ ( 6 )
12
1
12 ]
10 ×99
6 ( 261+0.5+ 0.5 ) 6 ×262 1572
¿ 1− =1= =1− =−0.587
990 990 990
p=1−
6 (∑ D + 121 (m −m )+ 121 (m −m) ........)
2 3 3
N ( N 2 −1 )
p=1−
[ 1 1
6 4− ( 23−2 ) + ( 23−2 )
12 12 ]
10 ( 10 −1 )
2
¿ 1−
[ 1 1
6 4+ ( 6 )+ ( 6 )
12 12 ]
10× 99
6 ( 4 +0.5+ 0.5 ) 6 ×5 30
¿ 1− =1= =1− =1−0.03=−0.997
990 990 990
Example 2.
Calculate rank correlation coefficient from the data in example 2 in section 11.4
Solution:
2
S.NO. X Y X-rank Y-rank D D
1 2 3 1 2 -1 1
2 4 5 2 4.5 -2.5 6.25
3 5 4 3 3 0 0
4 7 0 4 1 3 9
5 8 9 5 6 -1 1
6 10 5 6 4.5 1.5 2.25
7 11 14 7 7 0 0
2
N=7 ∑ D = 19.5
p=1−
6 (∑ D + 121 (m −m )+ 121 (m −m) ........)
2 3 3
N ( N 2 −1 )
¿ 1−
[ 1
6 19.5− ( 23−2 )
12 ]
7 ( 7 −1 )
2
¿ 1−
[ 1
6 19.5+ ( 6 )
12 ]
10 ×99
6 ( 19.5+0.5 ) 6× 20 120
¿ 1− =1= =1− =1−0.36=0.64
336 336 336
Example 3.
Calculate rank correlation coefficient from data in Example 3 in Section 11.4.
Solution :
6∑ D
2
6×0 0
p=1− =1− =1− =1−0=1
N ( N −1 )
2
10 ( 36−1 ) 210
Example 4
Calculate rank correlation coefficient from data in Example 4 in Section 11.4.
Solution :
S.No. X Y X-rank Y-rank D D2
1 1 3 5 3 2 4
2 2 3 4 3 1 1
3 3 3' 3 3 0 0
4 4 3 2 3 -1 1
5 5 3 1 3 -2 4
N=5 ED2 = 18
p=1−
6 (∑ D + 121 (m −m )+ 121 (m −m) ........)
2 3 3
N ( N 2 −1 )
[ [ ]
]
1 3
6 18+ ( 5 −5 )
12
¿ 1−
5 ( 5 −1 )
2
¿ 1−
1
12 [
× (125−5 )
6 18+
]
5 ×24
6 ( 18+10 ) 168
¿ 1− =1− =1−1.04=0.4
120 130
POINTS TO REMEMBER
• A measure of correlation is an expression of quantitative relationship between two
variables. However, it does not indicate any cause and effect relationship between the
variables.
• Study of relation between two variables is called simple correlation. Study of relation
between more than two variables is called multiple correlation.
• When both the variables move in the same direction, it iscalled positive correlation.
When both the variables move in the opposite direction, it iscalled negative correlation.
• When both the variables change in the same proportion, itis called linear correlation.
When the two variables change in different ratios, it is called non-linear correlation.
• Scatter diagram is a graphic method which indicates correlation but does not calculate the
exact degree.
• The basic method of calculation of coefficient of correlation ( γ ) is the product moment
method given by Karl Pearson.
6∑ D
2
p=1−
[ 6 (∑ D +( m −m ) +( m −m ) ........ ) ]
2 3 3
N ( N −1 )
2
EXERCISES
MULTIPLE CHOICE QUESTIONS [1 Mark]
(Answers at the end of exercises)
Choose the correct alternative in the following questions:
1. Spot the correct statement about simple correlation and partial correlation:
(a) Both study relation between only two variables.
(b) Both establish cause and effect relationship.
(c) Simple correlation takes into consideration other variables.
(d) Partial correlation confines to only two variables.
2. When X falls, Y also falls. There is perfect correlation between the two. The correlation
coefficient between the two is
(a) Zero (b) Infinity (c) +1 (d) -1
3. Correlation between two variables is
(a) Proof of relationship between two variables.
(b) Simply some pointer of relationship between two variables.
(c) Neither (a) nor (b)
(d) Either (a) or (b)
Country NUMERICAL
Happiness Score *QUESTIONS
(out of Total No. of medals
1. Calculate coefficient of correlationmax.
between
10) series X and Y. won
USA 7.119 119
X Great Britain
10 8 6
6.867 4 2 66
Y China 2 4 (Ans. - 1)
6
5.140 8 10 70
2. CalculateRussia
coefficient of correlation 5.716
between series A and B. 56
Germany 6.750 42
A 1 2 3 4 5 6 7
Japan 5.987 41 (Ans. 0)
B 4 4 4 4 4 4 4
France 6.575 40
3. Calculate coefficient of correlation between age of wives and age of husbands.
South Korea 5.984 21
Age of Italy
wives 20 215.948 19 25 24 28
Age of Australia
husbands 24 247.284 21 30 24 29 (Ans. + 0.81)
* Source : World Happiness Report (2015) : New York : Sustainable Development Solution Network.
4. Calculate coefficient of correlation between prices and quantity supplied at each price.
Calculate Coefficient of correlation between PPP Gross National Income (GNI) per capita and
total number
Price (₹) of medals
1 (Gold 2+ Silver 3+ Bronze)4 won by 510 winner countries in Rio (Brazil)
Olympics 2016. (Ans. + 0.97)
Supply (units) 10 12 14 20 25
Country PPP GNI * per capita (2013) Total No. of
(thousand dollars) medals won
USA 52.3 119
Great Britain 35.0 66
China 11.5 70
Russia 22.6 56
Germany 43.0 42
Japan 36.7 41
France 36.6 40
South Korea 30.3 21
Italy 32.7 28
Australia 41.5 29 (Ans. + 0.214)
* Source : Human Development Report 2014.
The index of the year 2011-12 is 100. The reference year, i.e., base year is 2011— 12, is always
taken as 100. Now, compare 2016-17 with 2015-16. The index of 2016- 17 shows that there is 3 -2%
increase in the wholesale prices during 2016-17. Remember, this 3 • 2% change is an average
change, and it does not indicate that price of every good or service has increased by 3.2%. In this
sense an index is a statistical average.
When we compare the index of one year with the index of base year, it tells us about the
percentage change between the base year and the year of comparison. For example, the index of
2016-2017 is 129. It indicates that between 2011-12 and 2016-2017, i.e., over the 5 year period, the
average wholesale prices of goods in India has increased by
Take, for example, the General Index of Wholesale Prices of India :
TABLE 12.1
General Index of Wholesale Prices
(Base : 2011-12 = 100)
Year Index
2011-12 100
2012-13 111
2013-2014 122
2014-2015 125
2015-2016 125
2016-2017 129
(Source : Economic Survey : 2016-2017) Government of India, Ministry of Finance
29%. Note that this entire increase is not during one year but over a period of 5 years. In this sense
an index number measures changes over a period of time.
However, for year to year comparison we have to make some more statistical calculations.
Suppose we want to find out the change in price level between 2015-16 and 2016-2017. The index of
the two years are 125 and 129 respectively. We can find out the change during 2010-2011 in the
following way.
129
-----x 100 = 103.2
125
The calculation shows that wholesale price level during the year 2016-2017 increased by 3-2%.
Also, note that it is a general price index. By general it is meant that it includes nearly all
commodities in its scope, like food articles, non-food articles, manufactured products, etc. In this
way an index number shows changes in a group of related variables.
12.2.2 Significance of Index Numbers
An index number measures relative changes over a period of time. There are many index
numbers : Wholesale Price Index (WPI), Consumer Price Index (CPI), Index of Industrial Production
(IIP), Index of Agricultural Production and so on. Each has its own significance. The significance of
WPI, CPI and IIP is explained in section 12.5, 12.6 and 12.7 respectively. We explain here some of
the general uses of index numbers.