Professional Documents
Culture Documents
11 Stat 7 Correlation PDF
11 Stat 7 Correlation PDF
7 Correlation
ship. If all the points lie on a line, the Inspection of the scatter diagram
correlation is perfect and is said to be gives an idea of the nature and
unity. If the scatter points are widely intensity of the relationship.
dispersed around the line, the
correlation is low. The correlation is Karl Pearson’s Coef ficient of
said to be linear if the scatter points Correlation
lie near a line or on a line. This is also known as product moment
Scatter diagrams spanning over correlation and simple correlation
Fig. 7.1 to Fig. 7.5 give us an idea of coefficient. It gives a precise numerical
the relationship between two value of the degree of linear
variables. Fig. 7.1 shows a scatter relationship between two variables X
around an upward rising line and Y. The linear relationship may be
given by
indicating the movement of the
Y = a + bX
variables in the same direction. When
This type of relation may be
X rises Y will also rise. This is positive
described by a straight line. The
correlation. In Fig. 7.2 the points are intercept that the line makes on the
found to be scattered around a Y-axis is given by a and the slope of
downward sloping line. This time the the line is given by b. It gives the
variables move in opposite directions. change in the value of Y for very small
When X rises Y falls and vice versa. change in the value of X. On the other
This is negative correlation. In Fig.7.3 hand, if the relation cannot be
there is no upward rising or downward represented by a straight line as in
sloping line around which the points Y = X2
are scattered. This is an example of the value of the coefficient will be zero.
no correlation. In Fig. 7.4 and Fig. 7.5 It clearly shows that zero correlation
the points are no longer scattered need not mean absence of any type
around an upward rising or downward of relation between the two variables.
falling line. The points themselves are Let X1, X2, ..., XN be N values of X
on the lines. This is referred to as and Y1, Y2 ,..., YN be the corresponding
perfect positive correlation and perfect values of Y. In the subsequent
negative correlation respectively. presentations the subscripts
indicating the unit are dropped for the
Activity
sake of simplicity. The arithmetic
means of X and Y are defined as
• Collect data on height, weight
ΣX ΣY
and marks scored by students
in your class in any two subjects
X= ; Y=
N N
in class X. Draw the scatter and their variances are as follows
diagram of these variables taking
two at a time. What type of Σ( X - X )2 ΣX 2
relationship do you find? s2 x = = - X2
N N
CORRELATION 95
96 STATISTICS FOR ECONOMICS
r = Σxy Ns s ...(1)
x y
or
Σ( X - X ) ( Y - Y )
r= ...(2)
Σ( X - X )2 Σ( Y - Y )2
or
(ΣX )(ΣY )
ΣXY - • If r is positive the two variables
r= N move in the same direction. When
(ΣX ) 2 (ΣY ) 2 ...(3) the price of coffee, a substitute of
ΣX 2 - ΣY 2 -
N N tea, rises the demand for tea also
rises. Improvement in irrigation
or
NΣXY (ΣX )(ΣY ) facilities is associated with higher
r= yield. When temperature rises the
NΣX 2 (ΣX )2 NΣY 2 (ΣY )2 ...(4) sale of ice-creams becomes brisk.
CORRELATION 97
From Table 7.1 we get, education, higher will be the yield per
acre. It underlines the importance of
Σxy = 42, farmers’ education.
To use formula (3)
Σ( X - X )2 112
sx = = ,
N 7 (ΣX )(ΣY )
ΣXY -
r= N
Σ( Y - Y )2 38 (ΣX ) 2 (ΣY ) 2 ...(3)
sy = = ΣX 2 - ΣY 2 -
N 7 N N
Substituting these values in the value of the following expressions
formula (1) have to be calculated i.e.
42 ΣXY, ΣX 2 , ΣY 2 .
r= = 0.644
112 38 Now apply formula (3) to get the
7 value of r.
7 7
The same value can be obtained Let us know the interpretation of
from formula (2) also. different values of r. The correlation
coefficient between marks secured in
Σ ( X - X )( Y - Y ) English and Statistics is, say, 0.1. It
r= ...(2)
Σ ( X - X )2 Σ ( Y - Y )2 means that though the marks secured
in the two subjects are positively
42 correlated, the strength of the
r= = 0.644
112 38 relationship is weak. Students with high
Thus years of education of the marks in English may be getting
farmers and annual yield per acre are relatively low marks in statistics. Had
positively correlated. The value of r is the value of r been, say, 0.9, students
also large. It implies that more the with high marks in English will
number of years farmers invest in invariably get high marks in Statistics.
TABLE 7.1
Calculation of r between years of schooling of farmers and annual yield
Years of (X– X ) (X– X ) 2 Annual yield (Y– Y ) (Y– Y )2 (X– X )(Y– Y )
Education per acre in ’000 Rs
(X) (Y)
0 –6 36 4 –3 9 18
2 –4 16 4 –3 9 12
4 –2 4 6 –1 1 2
6 0 0 10 3 9 0
8 2 4 10 3 9 6
10 4 16 8 1 1 4
12 6 36 7 0 0 0
been derived from simple correlation concerning the data is not utilised.
coefficient where individual values The first differences of the values of
have been replaced by ranks. These the items in the series, arranged in
ranks are used for the calculation of order of magnitude, are almost never
correlation. This coefficient provides constant. Usually the data cluster
a measure of linear association around the central values with smaller
between ranks assigned to these differences in the middle of the array.
units, not their values. It is the If the first differences were constant
Product Moment Correlation between then r and r k would give identical
the ranks. Its formula is results. The first difference is the
difference of consecutive values.
6ΣD
2
rk = 1 ...(4) Rank correlation is preferred to
n3 n Pearsonian coefficient when extreme
where n is the number of observations values are present. In general
and D the deviation of ranks assigned rk is less than or equal to r.
to a variable from those assigned to The calculation of rank correlation
the other variable. When the ranks are will be illustrated under three
repeated the formula is situations.
rk = 1– 1. The ranks are given.
2. The ranks are not given. They have
È ( m 31 - m1 ) ( m 32 - m 2 ) ˘
6 ÍΣD2 + + + ...˙ to be worked out from the data.
Î 12 12 ˚ 3. Ranks are repeated.
n( n - 1)
2
where m1, m2, ..., are the number of Case 1: When the ranks are given
m 31 m1 Example 3
repetitions of ranks and ...,
12 Five persons are assessed by three
their corresponding correction judges in a beauty contest. We have
factors. This correction is needed for to find out which pair of judges has
every repeated value of both variables. the nearest approach to common
If three values are repeated, there will perception of beauty.
be a correction for each value. Every Competitors
time m1 indicates the number of times
Judge 1 2 3 4 5
a value is repeated.
All the properties of the simple A 1 2 3 4 5
B 2 4 1 5 3
correlation coefficient are applicable C 1 3 5 2 4
here. Like the Pearsonian Coefficient
of correlation it lies between 1 and There are 3 pairs of judges
–1. However, generally it is not as necessitating calculation of rank
accurate as the ordinary method. This correlation thrice. Formula (4) will be
is due the fact that all the information used —
102 STATISTICS FOR ECONOMICS
Î 12 12 ˚ = 1 - 0.786 = 0.214
n( n 2 - 1) Thus there is positive rank correlation
where m1, m2, ..., are the number between X and Y. Both X and Y move
of r epetitions of ranks and in the same direction. However, the
relationship cannot be described as
m 31 - m1 strong.
..., their corresponding
12
correction factors. Activity
X has the value 35 both at the • Collect data on marks scored by
4th and 5th rank. Hence both are 10 of your classmates in class
given the average rank i.e., IX and X examinations. Calculate
the rank correlation coefficient
4+5 between them. If your data do not
th = 4.5 th rank
2 have any repetition, repeat the
exercise by taking a data set
having repeated ranks. What are
X Y Rank of Rank of Deviation in D2 the circumstances in which rank
Ranking corr elation coef ficient is
XR' YR'' D=R'–R'' preferred to simple correlation
25 55 6 2 4 16 coefficient? If data are precisely
45 80 1 1 0 0 measured will you still prefer
35 30 4.5 8 3.5 12.25 rank correlation coefficient to
40 35 3 7 –4 16 simple correlation? When can
15 40 8 5 3 9 you be indifferent to the choice?
19 42 7 4 3 9 Discuss in class.
35 36 4.5 6 –1.5 2.25
42 48 2 3 –1 1
4. CONCLUSION
Total ΣD = 65.5
We have discussed some techniques
The necessary correction thus is for studying the relationship between
104 STATISTICS FOR ECONOMICS
Recap
• Correlation analysis studies the relation between two variables.
• Scatter diagrams give a visual presentation of the nature of
relationship between two variables.
• Karl Pearson’s coefficient of correlation r measures numerically only
linear relationship between two variables. r lies between –1 and 1.
• When the variables cannot be measured precisely Spearman’s rank
correlation can be used to measure the linear relationship
numerically.
• Repeated ranks need correction factors.
• Correlation does not mean causation. It only means
covariation.
EXERCISES
Activity
• Use all the formulae discussed here to calculate r between
India’s national income and export taking at least ten
observations.