Professional Documents
Culture Documents
7 Correlation
2015-16
92 STATISTICS FOR ECONOMICS
2015-16
CORRELATION 93
2015-16
94 STATISTICS FOR ECONOMICS
ship. If all the points lie on a line, the A careful observation of the scatter
correlation is perfect and is said to be diagram gives an idea of the nature
unity. If the scatter points are widely and intensity of the relationship.
dispersed around the line, the
correlation is low. The correlation is Karl Pearson’s Coefficient of
said to be linear if the scatter points lie Correlation
near a line or on a line. This is also known as product moment
Scatter diagrams spanning over correlation and simple correlation
Fig. 7.1 to Fig. 7.5 give us an idea of coefficient. It gives a precise numerical
the relationship between two value of the degree of linear
variables. Fig. 7.1 shows a scatter relationship between two variables X
a round an upward rising line and Y. The linear relationship may be
given by
indicating the movement of the
Y = a + bX
variables in the same direction. When
This type of relation may be
X rises Y will also rise. This is positive described by a straight line. The
correlation. In Fig. 7.2 the points are intercept that the line makes on the
found to be scattered around a Y-axis is given by a and the slope of
downward sloping line. This time the the line is given by b. It gives the
variables move in opposite directions. change in the value of Y for very small
When X rises Y falls and vice versa. change in the value of X. On the other
This is negative correlation. In Fig.7.3 hand, if the relation cannot be
there is no upward rising or downward represented by a straight line as in
sloping line around which the points Y = X2
are scattered. This is an example of the value of the coefficient will be zero.
no correlation. In Fig. 7.4 and Fig. 7.5 It clearly shows that zero correlation
the points are no longer scattered need not mean absence of any type
around an upward rising or downward of relation between the two variables.
falling line. The points themselves are Let X1, X2, ..., XN be N values of X
on the lines. This is referred to as and Y1, Y2 ,..., YN be the corresponding
perfect positive correlation and perfect values of Y. In the subsequent
negative correlation respectively. presentations the subscripts
indicating the unit are dropped for the
sake of simplicity. The arithmetic
Activity
means of X and Y are defined as
• Collect data on height, weight
and marks scor ed by students ΣX ΣY
in your class in any two subjects
X= ; Y=
N N
in class X. Draw the scatter
and their variances are as follows
diagram of these variables taking
two at a time. What type of Σ( X − X ) 2 ΣX 2
relationship do you find? σ 2x = = − X2
N N
2015-16
CORRELATION 95
2015-16
96 STATISTICS FOR ECONOMICS
r = Σxy ...(1)
Nσ x σ y
or
Σ( X − X ) ( Y − Y)
r = ...(2)
Σ( X − X ) 2 Σ( Y − Y) 2
or
(ΣX )(Σ Y)
ΣXY − • If r is positive the two variables
r= N move in the same direction. When
(Σ X ) 2 (Σ Y) 2 ...(3) the price of coffee, a substitute of
ΣX 2 − Σ Y2 −
N N tea, rises the demand for tea also
or rises. Improvement in irrigation
facilities is associated with higher
NΣXY – ( ∑ X)( ∑ Y) yield. When temperature rises the
r=
NΣX 2 – ( ΣX ) 2 • NΣY2 – ( ΣY) 2 ...(4) sale of ice-creams becomes brisk.
2015-16
CORRELATION 97
2015-16
98 STATISTICS FOR ECONOMICS
From Table 7.1 we get, education, higher will be the yield per
acre. It underlines the importance of
Σ xy = 42, farmers’ education.
To use formula (3)
Σ( X − X )2 112
σx = = ,
N 7 (ΣX )(ΣY )
ΣXY −
r= N
Σ( Y − Y)2 38 (ΣX )2 (ΣY )2 ...(3)
σy = = ΣX2 − ΣY 2 −
N 7 N N
Substituting these values in the value of the following expressions
formula (1) have to be calculated i.e.
42 ΣXY, ΣX 2 , ΣY 2 .
r= = 0 .644
112 38 Now apply formula (3) to get the
7 value of r.
7 7
The same value can be obtained Let us know the interpretation of
from formula (2) also. different values of r. The correlation
coefficient between marks secured in
Σ ( X − X )( Y − Y ) English and Statistics is, say, 0.1. It
r= ...(2)
Σ ( X − X )2 Σ ( Y − Y )2 means that though the marks secured
in the two subjects are positively
42 correlated, the strength of the
r= = 0 .644
112 38 relationship is weak. Students with high
Thus years of education of farmers marks in English may be getting
and annual yield per acre are relatively low marks in statistics. Had
positively correlated. The value of r is the value of r been, say, 0.9, students
also large. It implies that more the with high marks in English will
number of years farmers invest in invariably get high marks in Statistics.
TABLE 7.1
Calculation of r between years of schooling of far mers and annual yield
Years of (X– X ) (X– X ) 2 Annual yield (Y– Y ) (Y– Y ) 2 (X– X )(Y– Y )
Education per acre in ’000 Rs
(X) (Y)
0 –6 36 4 –3 9 18
2 –4 16 4 –3 9 12
4 –2 4 6 –1 1 2
6 0 0 10 3 9 0
8 2 4 10 3 9 6
10 4 16 8 1 1 4
12 6 36 7 0 0 0
2015-16
CORRELATION 99
2015-16
100 STATISTICS FOR ECONOMICS
ΣUV −
( ΣU )( ΣU ) rods nor weighing scales are available.
The students can be easily ranked in
r= N terms of height and weight without
(3)
( ΣU ) ( ΣV )
2 2
using measuring rods and weighing
ΣU − 2
ΣV − 2
scales.
N N There are also situations when you
are required to quantify qualities such
41 × 35 as fairness, honesty etc. Ranking may
378 − be a better alternative to quantifica-
= 5
tion of qualities. Moreover, sometimes
( 41) 2 ( 35)2
423 − 343 − the correlation coefficient between two
5 5 variables with extreme values may be
quite different from the coefficient
= 0.98 without the extreme values. Under
these circumstances rank correlation
The strong positive correlation provides a better alternative to simple
between price index and money correlation.
supply is an important premise of Rank correlation coefficient and
monetary policy. When the money simple correlation coefficient have the
supply grows the price index also rises. same interpretation. Its formula has
2015-16
CORRELATION 101
been derived from simple correlation not utilised. The first differences of
coefficient where individual values have the values of items in the series,
been replaced by ranks. These ranks arranged in order of magnitude, are
are used for the calculation of almost never constant. Usually the
correlation. This coefficient provides data cluster around the central
a measure of linear association between values with smaller differences in the
ranks assigned to these units, not their middle of the array. If the first
values. It is the Product Moment differences were constant, then r and
Correlation between the ranks. Its r k would give identical results. The
formula is first difference is the difference of
...(4) consecutive values. Rank
correlation is preferred to Pearsonian
where n is the number of observations coefficient when extreme values are
and D the deviation of ranks assigned present. In general
to a variable from those assigned to rk is less than or equal to r.
the other variable. When the ranks are The calculation of rank correlation
repeated the formula is will be illustrated under thre e
situations.
( m 31 − m1 ) ( m 32 − m 2 ) 1. The ranks are given.
6 ΣD2 + + + ...
2. The ranks are not given. They have
r s = 1–
12 12
n( n2 −1 ) to be worked out from the data.
3. Ranks are repeated.
where m1, m2, ..., are the number of
m31 m1 Case 1: When the ranks are given
repetitions of ranks and ..., Example 3
12
their corresponding corr ection Five persons are assessed by three
factors. This correction is needed for judges in a beauty contest. We have
every repeated value of both variables. to find out which pair of judges has
If three values are repeated, there will the nearest approach to common
be a correction for each value. Every perception of beauty.
time m1 indicates the number of times Competitors
a value is repeated.
All the properties of the simple Judge 1 2 3 4 5
correlation coefficient are applicable A 1 2 3 4 5
here. Like the Pearsonian Coefficient B 2 4 1 5 3
C 1 3 5 2 4
of correlation it lies between 1 and
–1. However, generally it is not as There are 3 pairs of judges
accurate as the ordinary method. necessitating calculation of rank
This is due the fact that all the correlation thrice. Formula (4) will be
information concerning the data is used.
2015-16
102 STATISTICS FOR ECONOMICS
2015-16
CORRELATION 103
common rank is the mean of the ranks The necessary correction thus is
which those items would have assumed m3 − m 2 3 − 2 1
if they were slightly different from each = =
12 12 2
other. The next item will be assigned
Using this equation
the rank next to the rank already
assumed. The formula of Spearman’s (m 3 − m)
6 Σ D 2 +
rs = 1 −
rank correlation coefficient when the 12 ...(5)
ranks are repeated is as follows n3 − n
Substituting the values of these
rs = 1 −
expressions
( m31 − m1) ( m32 − m2 )
6 ΣD 2 + + + ... 6(65. 5 + 0. 5) 396
12 12 rs = 1 − = 1−
n(n − 1)
2 8 −8
3 504
=1 − 0 .786 = 0 .214
where m 1, m 2, ..., are the number
of repetitions of ranks and Thus there is positive rank correlation
between X and Y. Both X and Y move
m31 − m1 in the same direction. However, the
..., their corresponding
12 relationship cannot be described as
correction factors. strong.
X has the value 35 both at the
Activity
4th and 5th rank. Hence both are
• Collect data on marks scored by
given the average rank i.e.,
10 of your classmates in class IX
4+5 th and X examinations. Calculate the
th = 4. 5 rank
2 rank correlation coefficient
between them. If your data do not
have any repetition, repeat the
X Y Rank of Rank of Deviation in D 2 exercise by taking a data set
XR' YR'' Ranking having repeated ranks. What are
D=R'–R''
the circumstances in which rank
25 55 6 2 4 16 correlation coefficient is preferred
45 80 1 1 0 0 to simple correlation coefficient? If
data are precisely measured will
35 30 4.5 8 3.5 12.25
you still prefer rank correlation
40 35 3 7 –4 16 coefficient to simple correlation?
15 40 8 5 3 9 When can you be indifferent to the
19 42 7 4 3 9 choice? Discuss in class.
35 36 4.5 6 –1.5 2.25
4. CONCLUSION
42 48 2 3 –1 1
We have discussed some techniques
Total ΣD = 65 .5
for studying the relationship between
2015-16
104 STATISTICS FOR ECONOMICS
Recap
• Correlation analysis studies the relation between two variables.
• Scatter diagrams give a visual presentation of the nature of
relationship between two variables.
• Karl Pearson’s coefficient of correlation r measures numerically only
linear relationship between two variables. r lies between –1 and 1.
• When the variables cannot be measured pr ecisely Spearman’s rank
correlation can be used to measure the linear relationship
numerically.
• Repeated ranks need correction factors.
• Correlation does not mean causation. It only means
covariation.
EXERCISES
1. The unit of correlation coef ficient between height in feet and weight in
kgs is
(i) kg/feet
(ii) percentage
(iii) non-existent
2. The range of simple correlation coefficient is
(i) 0 to infinity
(ii) minus one to plus one
(iii) minus infinity to infinity
3. If rxy is positive the relation between X and Y is of the type
(i) When Y incr eases X increases
(ii) When Y decreases X incr eases
(iii) When Y increases X does not change
2015-16
CORRELATION 105
2015-16
106 STATISTICS FOR ECONOMICS
Activity
• Use all the formulae discussed here to calculate r between
India’s national income and exports taking at least ten
observations.
2015-16