Correlation defined
are so related that a change in one is one is accompanied by ompanied by a change in the other in such a way thal (i) an increase in hy a clecrease or increase in lhe olher' or rlecrease in lhe other or (ii) decrease in one
increase
d
magnitude of the change in the the greater the magnilude of rhe change in one, the grealer lhe then the variables are said lo be correlated
For example
(i) an increase in the intensify of cold results in greater sale of woollen clothes' (ii) an increase in the price of a commodity results in a decrease in its demand' (iii ) an increase in the heights of children is accompanied by an increase in thelr u'eights'
(iv)
a decrease
in the price of
rations
correlation
Height (cm) Weight (kg)
: :
160 58
t62
60
18
163
t66
65
6t l5
120
110 68
t2 r80
175 70
:20 : 80
t4
150
10
100
200
Degree of correlation
Correlation may be perfect or imperfect. When the changes in the corresponding values of two perfect' It $les are proportion;I, directly or inversely, the correlation between them is said to be variable is accompanied by a .t poiitiu. if the increase (or decrease) in the values of one ional increase (or decrease) in the values of aSecond variable, e.g.,the correlation between
mferencesofcirclesandtheirradiiisperfectpositive. Ifthereverseisthecase,i.e.'ifthe
ihe correlation between the iwo variables is said to be perfect negative, e.g.' if a rectangle itant area, the correlation between the lengths of its sides is perfect negative,
,
(or decrease) in one is accompanied by a proportional decrease (or increase)^in the other
Such perfect (positive or negative) correlations are met with only in exact s; Mathematics, Physics, Chemistry, etc., but not in social and economic phenomenon tr phenomenon, the changes in one variable are not generally proportional to the changes ' In this case, the correlation between them, if it exists, is said to be imperfect positive cr depending upon its nature. Imperfect correlation again may be high, moderate or low. Tte imperfect correlation lies between perfect correlation and no correlation. Thus, we ma1 positive correlation, e.g., between incomes and standard of living or we may have negative e.g., between supply and price of commodity. Similarly, we have situation where conelation may be moderate (or low), negative or
All the plotted points lie on a st. line rising from the
lower left hand corner to the upper right hand corner.
Y
Perfect negative correlatiou All the plotted points lie on I st. line falling ffom the upper left hand corner to the lo*e:
narrow band and the points are rising from lower left hand corner to the upper
right hand
corner,
Fig. 17.01
The various methods to determine whether two variables are conelated or not are
Karl Pesrson's cofficient of conelation Rsnk melhod (Spearman's and Kendall's coefficient) Out of the above, only the Karl Pearson's and Spearman's methods are in the syllabus 'rn
Gorrelation
ch175
17.05. Scatter diagram nrmally, an indpendent variable or time is plotted on the horizontal axis..This is also called
Scatter diagram is a graphic device for finding correlation between two variables. One variable
a
as the dependent variable or one to be predicted is lbou'n on the vertical axis. The movements of the pairs of these variables shown by dots on the graph rreal whether they move in the same or the opposite direction.
It the points form a band of some width, it will indicate imperfect correlation between the two rbles' The direction of the band indicates the nature of correlation. If the band slopes upward, it ates positive correlation and if it slopes downward then it indicates negative conetation. t'ire r of the band gives an idea of the degree of correlation, The narrower theiand the greater is the
of correlation
When the points do not form a band, i.e, , they are scattered in all directions it indicates that there
ln the case of perfect correlation, the points will be on a straight line. The method is mainly used when we are interested in finding out whether there is correlation only in getting a rough idea about its nature and degree. It does not give us any measure of lation. The following diagrams illustrate the various cases.
tcsitive Correlation
(a)
o
Perfect Negative Correlation (b\
H
o
igh Degree Positive Correlation
(c)
Degree
No Correlation
(f)
(s)
Fig. 17.02
fit
ft
ctten there is not a straight line which passes through all the points but we can still draw the hne which comes closest to finding all the points. We can estimate the position of the line by TL.s hne shows the general trend of the relationship betwsen the two sets of data. It may or may through any ofthe data points.
'
Thegradientoftheline TltecJoserthepointsaretothislineofbest.frtthehigherthecotrelation. or horizontal line of best fit means that the variables are not oor imporfant except that a vertical
llere
are some
tpba/
posiiive
No correlation
High negative
corre lation
Low negairr
c o
rre la
tr:'
Fig' l7'03
strong re,a
Note. The method to find lguation of line of best fit, also called line of regression.
Ex. 1. Construct a straight line which approxinfates the following data, ie', the line of
l34689lll'r t244578)
Sol. Plot the points (1, l), (3, 2),(4,4), (6,4), (8,5), (9' 7)' (11, 8) and (14' 9) on
coordinate system as shown
a
in Fig.
17.04
10
o
A
4
2
P
4 6 I
10
12
Fig. 17.04
Definition. If the variable X takes the value of xr, xr, x3, .... xn and another variable I take* ur valuesyr,yr,13,......vrthenthecovariancebetweenthetwovariablesxandyiswrittenasco\ '.i
lNlr
CoVariance
and is defined as
Cov(X,Y):$,X)(y,V)*(rtX ,[
and
where
de
tc.
where
note the arithmetic means of the fwo series' i. e. , _ .tl + x2 +... + .rtr : \ + lz I...+ ln
n: I*
Q'ormulrl
where
dr=xii,d.r=Y.
r
The symbol p (pronounced 'ro') is a lefter ofthe Greek alphabet corresponding to the English
ron
Jfwe divide the covariance by the product ofthe individual standard deviations, the quotient so ftred is called the correlation coedcient. As it was suooearcrt h' tror rr^^^_ so : , ., . rt suggested by Karl Pearson, _pearson,s coefficient
ch177
Formula
p
"r#;].]:'T;"lt
Cov
it is cailed
6,
n:
o"o,,
(x,Y) *
\:
2@,)2
n
, therefore,
?!&
o"o.u
Ddxd
n,6x,gy
zd
;
and o,=
zd,d I
P6,n orr =
Zd'dY,
P6,norr:
n.ox.or
=
are rhe deviations takenfrom the actuql mean.
d, : xi x,dt = li  !
Srnce the above formula is based
es
on
Edrdr,
lfr = 0, there is no correlation. {lso as a general rule r from * 0'00 to + 0'20 denotes indifferent or negrigibre rerationship; r from + 0'20 to * 0.40 denotes row correration or sright rerationship; e from + 0.40 to + g.7g denotes substantiai or martieO relationship; . r from * 0.70 to + t.00 denotes high 1o n.rylrigl, relationship. and somlwhat t.niuijui, and can onri ue accepred
h will be seen rater on that the coefficient of correration isee Q. 30 in Exercis e t7 (a) and Arr. l7.l0l
}ff',fr?i:'rxf,:;::"0
as a generar guide
,?i; f;#?i,;rt*ilfir";:
Sol. We have
Lxv
no ro
.
s
_
];
t
2n
,\y)2 n'olol
0.25:
nx90x64
14400 _
0,20),'
0.25
x2n: 5 :+ r:
"(2e)"a+
10.
]..'o,,=}.=ru] n I nl
L_l
Text
"';r]lrl,l",li"J:"scores
rabre gives the tesr scores and sares bv nine saresmen during rast one
14 lg 24 2l 26 22 lS 20 45 33 41
lg
39
(r.s.c.
1993)
t4 t9
24
3l
36 48 37 50
l
4 I
6
2
6
4
8
9
54 4
32
3
36
il
t6
6d
2t
26
22
l6
I
36
3
IO
5
4s
33
60
l5
20
4t
39
5
0
*7
I
l0
35
rm
25
4
25 0 I
49
I I
l9
tr:180
t
0
I
l
r:
which shows
a
o'e4. approx.
very high relationship. the correlation coefficient between the corresponding varues of ,n t3tl;i;r,l*Tpute
xand
X
Y
6
8
l8
t2
l0
(/.J,C. 2007 n
l8
4
2
4
5
l6
4
I
l2 l0
8
8 2
64 4
0.
32
6
8
l
0
2
4
0
0 4
25
7
5
n
36
2 3 5
4
9
6
25 67
2s
x =f=0,
Hence, the
F=99=19
r)
ano'y,.y
2(x,l)z = rlxl
2x,.7+12;= zx._zr.zx,+n.i2
ch179
2(y, y)'
I(x,  x) (y,  V)
,i
,,,2 .(LY,)2 n
Z(xiy,
xiV
b,!,t(rfi)I
zx.v. _
n
yiT + x
(ny) +
(xx,) (xy,)
p(x,
t) :
2lfr, I)(yiy)
J*:'?1 1,,:
p(X, Y) or
ty
IJX!  ' =
Lr.Ev
series.
.
Calculate the coefficient of corretation between x and y for the following data.
formula
3
l0
I 5
I
4 I
tle
9 4
8
7
;
8l I r00 I
49 I 16 64 l,q
qs
16 I 25t4 36 I
e l2s
t
100
20
7
t5
4
t0
Sr
ffi
321
).rxy
Lr.n
2Y
54
28 64
63
302.s
18.5
:o
60
18.5
x
were made : It:30, E/: 5, 2x2 =,670,2y2 = 2t5,Zxy = :Rq. On subsequent verification it was that the pair (x = lr,y = 4) was copied ,,n.ongy, the correct values being (x = r0,y: 14 the correct value of correlation coeflicient.
Ex' 5' In order to find the correlation coefficient between two variables x and y pairs of observations, the following calculations
fror
Sof. Con'ected Xr= given, )xincorrectvalue+ correctvalue:30_ l1 Similarly, corrected Xy :^5 4 + 14 : 15, corrected Zxz : Arc_ (ll)2 + corrected z1?:ZSS @)2 +(t4)2:Ces, corrected
The correct value ofcorrelation coefficient is given by
+ l0:29
eO)2
Aqg,
LXy Xx. Xv
,rYll,r
+_
17 .0 9
OrO
_29 xtS
393.75
/L*'+llou,g]
#E=0.7747.
Jiivsz,4462s
. Third formura for p (X, r). (when the deviafi ons are taken from an as sumed u 4u 4Jrqtltc(I ll then neither Itll""::5:3i: ?11/,,r are large or involve fractions,;'i?iil'i 'lffiof rhe rwo formulas ffi il; :;;ffi :n",T;"H:H'fd :i_?il:ed ?i:::::?: ^: : :, * simplifi i by considering the deviatio", ;;;;r;;; ih, ;;ffi ;;; and y,
from assumed
rf ,, = tt ut 
),i: fui+
p6,n=
s24i
1Ex,)2
,iry
p(x,
nn
:
of
r=
_(2u)(2v)
A and B being the assumed meanc number of pairs of observation. This formula k is used when actual means are fractions.
Where
ch171
above formula can be written as
ln some books you will find the notation d, and'd, for u and v respectively. Using this
p6.norr(X,Y)
under Edrd y 
Ld .Ld
,oz (Ed,)2 ^n
tneans
*jq/
x
and y
,
book we have used the notation d, and drrespectively for the deviation from
the arithmetic means right in the beginning to know whether the means are whole
a fractions so that you may apply the formula rccordingly. tlr given values ars small, then apply formula (ll) hvolving direct values only'
retation of correlation coefficient
The coetficient of correlation shall always be between l and +l ' flren r is +1, there is perfect correlatioh between the variables' llren r is *1, there is perfect negative correlation between the variables.
Shen r is between 0.7 to 0.999, there is a high degree of correlation befween the variables. The correlation shallbe positive if the sign of r is plus (+) and negative if the sign of r is ninus (). $tren r is between 0.5 to 0.699, there is a moderate degree of correlation befween the
rariables.
Sten r is less than 0.5 there is a low degree of correlation between the variables' tr\tren r is zero, there is no conelation between the variables' a Calculate Karl Pearson's correlation coefficient between the marks in English anC by l0 students.
_ ^
l0
25
184
lo
ls.4o,
t=ffi=rt.z
uv
l2
22
8
7
64
6
4
a J
a
36
48
28
49
25
t6
4
9 0 0
I
t3
25
l6 l5
18
5
7
l0
21
0
0 6 35
49
T'
4'
_1
t6
49
36
0 0
ll
t2
25
l8
t7
23
6
7
J
t
5
49
9
25 36
I
2l
20
24
l8
z
a
l7
l
122
We take
l8
..
Ia Iy
I
rl
_
./{:so
V
p2 _
!/2 l0
Ex.
Jqqqssji
and y
Sol. Calculate
p6, n
To draw fhe scafter diagram, plt the points (2, 16), (4, l4), (14,4) and (16,2). Fig. r7,05 showi the required scatte, diagram.
''
16
14 12
10
I
6 4
246810121416
Fig. 17.05
The points in the scatter diagram proceed in a line from top to the bottom which indic and )'are in perfect negative correlation.
(a) calculate the value of the correlation coefficient for the following date : (g, j3)t (7,73), (8,83), (e, e3), (r0, 10.3), (il. ll,rt1r,,,r1,,lrl,,(1 2):9,,.!t):(r,.1.]: (t2,
Ex'
8'
tt), (lJ, ll.s), (l4,lz), (15, t2.s),iio, lr), (rz,'ii.s), fi's, iol,irn", (b)
Draw the scatter diagram.
Comment on the resutt.
i;;:rdl
(c)
ch1713
the first
variateXbe l0
uv
I
l3
23 33 43
53 63 73 83 93 103
2
3
.l
_i
6
1
*4
J
1
9 8 7 6 5
8l
64
0 100
l0
20 30 40 50 60 70
80
80
49 36
25
I
140
l6
9
2500
3600
I
9
.4
I
4900 6400
8100 6.25 4.00 2.25
1.00
t0
l
0
I
140
lt
t2
90
80
0
0
I
10,5
ll
I 1.5
2.5
L
a
2
J
2.5
l3 t.t l5 t6
4 9
*1.5
4
*4.5
t2
12.5
4
5
l
0.5
0 0.5
I
l6
25
+
2.5
0
l3
i 3.5
0.2s
0.00 0.2s
6
7
8
l7
r8
z6 4g
64
14
3.5
8
t9
t4.5 I5
:0
I
l0 tra = l0
t.00
2.25
1.5 2
8l
100
l3.s
2A
4.jtJ
Zv = 447.5
\uz :670
Zvu:
p6,n
__E,,_,lL),)(r")
/[r,''+i]
r r72.5

{r,'ry,
z+:ss"tmna6
_
t3e6.2s
1172.5
350&3s42
0.3e7e8
It)

".06.
Ploning the points (1, l3), (2, 23),.....(20,15) we obtain the scatter diagram as shown in
I.11.
Sometimes such problems are faced that it is possible to arrange the various items of a series in the quantitative measurement of their values is difficult for example, it is possible ':: a class teacher to arrange his students in ascending or descending ;order ofintelligence, even .:ugh intelligence cannot be measured quantitatively. No doubt, the quantitative study about the ::lligence of students can be made by holding an examination and assigningthem marks, but this .'thod can never be said to be infallible. There are many such attributes which are incapable of :rntitative measurements, tbr example, honesty, character, moralify, etc. In such ca.ses it is possible to rank the individual in some order. The most intelligent individual .:y be given rank l. next rank 2 and so on.
lre
The conelation coeificient between two series of ranks is called'Rank Correlation Coefficient'. formula for coefficient of rank correlation as gjven by Edward Spearman is
R=l
or R=
tu?o'
n n
r
is the nurnber
'rhere D is the difference between the corresponding ranks of the two series and rdividuals in cach series.
of
Note l. lnstead of assigning ranks I , 2, 3, .. . from highest to lowest, we can also assign these :anks flom the lowest to highest, i.e., rank I to the least intelligent. rank 2 to the next more intelligent, rert rank J, and so on.
= 0.
2. Remember that the algebraic sum of the rank differences is always 0, i.e." L D is always If it is not so, then some mistake has been commifted at the time of assigning ranks. Note3. 1'heinterpretationsofthevaluesof Rarethesameasgivenon page6 inArt. 17.05.
Note
(a)
given
(b)
lYpe l. Wken rsnks are given Working rulc Step l. Compute D, the dffirence of the ranl<s. Step 2. Compute D2 and get the sumZD2. Step 3. Substitute the values in theformula.
related ? M athematics. To what extent the knowledge of the students in the two subjects is Statistics
Sol.
:
l0
ld
Mathematics:
t234 2415
5678910 397106E
I  lfiatistics\x)l Mathematics(tt)l
non*".f
I r
I:Iq I
I
I
I 'e I I l; s I I
I ;1 iI s I t I z
I s t
'o e s
r I I I I
':t;@)
=+0.76
oLt)
6x40 'rrrtu
=ffi:,o:,r
I n I I I I ro I I I
Caution. When 1}pe
I 3  + I o  e I z lo +


I I
the ranks are already given as in the above example, rJo not commlt tne mls&lE
I ,:  s  .z I + I  l>o'z:aol I r
2. When runks are not given, gr Wt"t.n no ranks are given, but actual data are given, then we should assign ranks' We can next to the highest (lowest) as 2 and folh ranks by taking the highest as I or the lowest value as l, anks the same procedure for both the variables. he sz
Ex.
3l and coefficient of correlation of rarb Compute their ranks in the two subjects (/"S'C' 2002' CBJ{Interpret the result.)
Sol.
ll.
35 30
23 33
47 45
l7 23
l0 I
43 49
I 12
6 4
follor28
x
35 23
v
30
33
Ranksinx:R,
Ranks
iny ; R2
D:R,R, 2
2
D'
4
4
1
47
45
23
8
l
0
l7
0
I
l0
43
l
I
49
I
I 0
I
6
l2
4
I
0 0
28
3l
zD2:
12
r: t;@4:'t(slD'720'
l010 :u'v'
a very high relationshio This means that the students who are gor good ics are good in Mathematics also uir"_u.rri
hi gh value of
r indicates
ch17_i 721
l2 , Ten competitors in a beauty contest are 'anked by three jur lges in the followin wing
5 8
7 r
*d
8 6
correlation coefficient to discuss which pa ir ofjudges have tl re nearest approach tastes in beauty.
Second
4
8
6 9 l0
t0
l0
9
l
2
lr')
t
Third
6
7
8
Dtz:
RrRz
Drz: RrR,
Dzz: Rz*R.'
_1
I
D,,,
9 9 9
D,l
25
D,,,
4
I
6
5
3 3 3
')
5 2 4
7
5
4
16
I
5
l
5
,l
25 0
I
49
l0
9 2
3
l0
J
3
0 4
I
4
I
l
I
l6
9 0
t6 t6
I
I
I I
2
I
.)
_I
_J
l6
I I
25
0 4 2D?t
,DI,
:74
l I
I
'rz
u'!f'
EDlt
156
:44
 I
'J
tl
6x44 :0.73. m 99 " ! r:, is maximum, we conclude that the pair of second and third judges has the nearest )m beautv.
":
=lffi='
h fn" coefficient ofrank correlation ol ' ma rks_ obtained by I 0 students in English F was found to be 0.S. It was later disr :overed that the difference
Stained
and in rant<s in ttre two by one of the students was wrr rngly taken as 3 instead of 7. Find th. of rank correlation. (.5.C. 200e Typel
.;;;;;;
0.5
ZD2
ZD2:
32
+ 72:
122.5.
ch1722
LS.C. Mathematics
Ex. 14. Find out rank correlation from the following data:
S.N.
Rank differences
2
4
l
+3
+2
+3 I D:
+3
Sol. First we find the unknown rank difference by using the fact that :2. : value of the unknown rank difference + I I 9
Now we have
0. This gire:
S.N.
D: D2:
l2 ,24 416
34s67 r 3 2 19404
6r160
2
"3
9
9 l0 3*2 94
N:
l0
t D:0
L D2:60
. "
lf
fr coefficient and a correction or modification in the formula becomes necessary because this is given to : is based on the supposition that ranks of various items are different and that no rank
The problem is solved by assigning a common rank to each of the individuals who are in This common rank is the average of the ranks of these individuals' For example, we have the series
90
55 78 72 90 and 85 and the next value 80 appears four Rank I and rank 2 are assigned to two values,
85
80 80
= a.5
80
80
69
69
t!ry yf I
will
ass:i:nt lower value than 80, viz.,7Lwould be assigned the rank 7 because 6 ranks have already been rank 9 and 10, then fhe common rank assigru Now, we see that the value 69 is repeated twice at
be
=9.5 and the next value 55 will have the rank I I and the value 5:
(.Y)
have 85 90 a{
2
Rankassigned:
rank
69 69 55
n
:
j'
mmr
1ur
3t4 'f
t.z.^,o.L^3,5 to each ol the values 69, rank 5 Io 72, rank 6 to 78, then rank
7+8+9rlo : : bee'I
ll
ni
Rank assigned
12
85 80 80 8.5 8.5 II
80 78 8.56s
72
69
3.5
69 3.5
55 2
ch1723
rept :ated values an adjustment,t or
as eries there are rn items whose ranks are common, then for
c<
)rectton
.l
rre peating value in both the series. The modified formula, the n is given b
lf
(m3
added m)isadd
:r
o(o'i
:,
3d
the in GreatBr itain in the years
resr nlt.
fr1, tn2t nt3;...... are the numbers of times a value is repeat .ed.
15. Tie figures below give the numbers of passenger_c gr( ,pair, and the number of persons killed in train accidents arrying veJ ricles under
The method
IH.s.c.l
br hls
x
Vehicles under
v
No.
of
repair (1A00's) )1
2.s 2.7
2,5 2.5
Rank
Rank
Persons 30
'r'7
ofx
7
D
a
ofy
9
D2
h,, E04o
4 0
I 64
9
l0
7
l0
6 2
7
0 I
8
50 76 37
14
f*' lEa
f*t
i
l0 l0
5 3 1.5 1.5
2.7
3.5
lt44
ll
8 3
*4
J
t6
9
34
75
l9r5
lv+6
0
3.5
.5
0
12.25
60
121 74
llAT
1948
5
1
0.25 0
4.0
4.0
t 15.5
+: 2
^
oI
10.
=! #,r*t
m)tothevalue
be
(Art, 17.13)
lf
also.
there had been repeated values inyseries also, we would have added the correction
fq
.. rrt
=
I6x120
6l(>Dt)+4.51_, ,
6il15.5+4.51 I lr _ ll
_r l
l33t _t I =l
To _, ffi
=
0.545 = 0.455
Since the value 0.455 or r lies between 0.40 and 0.70, therefore, it signifies substantial or
relationship. It means that ordinarily, the higher the number of passengercarrying vehickr repair, the higher is the number of persons killed in train accidents in Britain. But it does not
follow that one is the cause of the other. There can be many other causes, for example, wear
etc.
Ex. 16. Find out the rank correlation coefficient between the heights of fathers rJ from the following data: Height of fathers in inches: 65 66 67 67 68 69 70 Height of sons in inches : 67 68 65 68 72 72 69 7t
Sol. x
v
67 68
65
Rank
ofx
Rank
ofy
D: R, R,
I 1.5
D2
(Rr)
65
8 7
(Rz)
',
5.5
8
66 67 67
68
5,5 5.5
1.5
0
68 72
72
5.5
1.5
4
J
2.5
69 70
72
1.5
r.5
))\
4
69
2
I
4
3
z
1
7l
L
In the xseries, 67 occurs twice and its rank In the yseries, 72 occurs twice and its rank
68 occurs twice and its rank
D2
=26
: I'''s
ir
S.S
:
:
t2
.'.
+l
2) = 0.5 + 0.5 =
1.5
6x27.5
504
so{:165
*j
ch1725
Ex. 17. Find the Spearrnan's rank coefficient of correlation from the fotlowing data
:
Yz13
Sol.
33 13
X
ztt)
I
6
24
l6 l5
16
65 25 209616
16
I
57
We will solve this question by assigning ranks from Iowest to highest, i.e., rank lest value, rank 2 to the next value and so on. Series A
to the
Series )/
I
Rank Differences
Squares D2
Rank  X
Rank
)'
D=RrR,
+ 2.5 + 0.5
_J
(Rr)
48
33
8
(Rz)
6
1 I J J
l3 l3
LI
5.5 5.5
6.25 0.25
9
40
9
1A
t0
2.s
7
l6 t6
65
l5
4
2A
 1.5 4
+2
2.25
t6
4
I I
I
9 4
l0
5
25
9 6
+l +l
+ 0.5
l6
57
?5
0.25 I
l9
+l
t0
is repeated 3 tirnes
n:
in seriesXanO
l0
Ip:o
Lo2:ql
,o
$t:t 3) will be added to L D2.ln seriesy, l3 is repeated #rrt 2) will be added twice to I
*;+$<m',
D2.
R: t*
LD'z+iQn:t
: t_
4r+if(33
Typc
l.
=I
 p4, n(n" 60 78
l\
:
l.
Thc rnarks obtaincd by nine students in Physics and Mathematics are given below
Physics
Mathematics
48 62
72 65
62 70
56 38
40 54
39 60
52 32
30
3t