Correlation defined
are so related that a change in one is one is accompanied by ompanied by a change in the other in such a way thal (i) an increase in hy a clecrease or increase in lhe olher' or rlecrease in lhe other or (ii) decrease in one
t
increase
d
magnitude of the change in the the greater the magnilude of rhe change in one, the grealer lhe then the variables are said lo be correlated
For example
(i) an increase in the intensify of cold results in greater sale of woollen clothes' (ii) an increase in the price of a commodity results in a decrease in its demand' (iii ) an increase in the heights of children is accompanied by an increase in thelr u'eights'
(iv)
a decrease
in the price of
a
commodity is accompanied by an increase in its demand.
Positive and negative correlation
the variables Whether correlation is positive or negative would depend upon the direction in which in the same direction, i.e., when an increase or decrease in moving. If both the vaiiables move correlation between the corresf,onds to an increase or decrease respectively in the other, then the or direct. variables is said to be positive to a lf both the variables move in opposite directions, i.e,, when an increase in one corresponds in the other, then the correlation ;ase in the other or a decrease in one conesponds to an increase
the two variables is said to be negative or inverse.
rations
correlation
Height (cm) Weight (kg)
: :
160 58
t62
60
18
163
t66
65
6t l5
120
110 68
t2 r80
175 70
five correlation Price (Rs per unit)
Demand (units)
:20 : 80
t4
150
10
100
200
Degree of correlation
Correlation may be perfect or imperfect. When the changes in the corresponding values of two perfect' It $les are proportion;I, directly or inversely, the correlation between them is said to be variable is accompanied by a .t poiitiu. if the increase (or decrease) in the values of one ional increase (or decrease) in the values of aSecond variable, e.g.,the correlation between
mferencesofcirclesandtheirradiiisperfectpositive. Ifthereverseisthecase,i.e.'ifthe
ihe correlation between the iwo variables is said to be perfect negative, e.g.' if a rectangle itant area, the correlation between the lengths of its sides is perfect negative,
,
(or decrease) in one is accompanied by a proportional decrease (or increase)^in the other
Such perfect (positive or negative) correlations are met with only in exact s; Mathematics, Physics, Chemistry, etc., but not in social and economic phenomenon tr phenomenon, the changes in one variable are not generally proportional to the changes ' In this case, the correlation between them, if it exists, is said to be imperfect positive cr depending upon its nature. Imperfect correlation again may be high, moderate or low. Tte imperfect correlation lies between perfect correlation and no correlation. Thus, we ma1 positive correlation, e.g., between incomes and standard of living or we may have negative e.g., between supply and price of commodity. Similarly, we have situation where conelation may be moderate (or low), negative or
Perfect positive correlation
All the plotted points lie on a st. line rising from the
lower left hand corner to the upper right hand corner.
Y
Perfect negative correlatiou All the plotted points lie on I st. line falling ffom the upper left hand corner to the lo*e:
right hand corner.
o
High degree positive
correlation The plotted points (x,, y;),
High degree negative
correlation If the plotted points (x,, y,), i : 1,2,3, ..., fr fall in a narrow band from upper
left hand comer to the lower
i : 1,2,3, ,.., n fall in a
narrow band and the points are rising from lower left hand corner to the upper
right hand
corner,
Fig. 17.01
left hand corner.
n,A4 (i) (ii) (iii)
Methods of studying correlation
:
The various methods to determine whether two variables are conelated or not are
Scatter diogram melhod
Karl Pesrson's cofficient of conelation Rsnk melhod (Spearman's and Kendall's coefficient) Out of the above, only the Karl Pearson's and Spearman's methods are in the syllabus 'rn
Gorrelation
ch175
17.05. Scatter diagram nrmally, an indpendent variable or time is plotted on the horizontal axis..This is also called
Scatter diagram is a graphic device for finding correlation between two variables. One variable
a
as the dependent variable or one to be predicted is lbou'n on the vertical axis. The movements of the pairs of these variables shown by dots on the graph rreal whether they move in the same or the opposite direction.
pedicting variable. The other variable known
It the points form a band of some width, it will indicate imperfect correlation between the two rbles' The direction of the band indicates the nature of correlation. If the band slopes upward, it ates positive correlation and if it slopes downward then it indicates negative conetation. t'ire r of the band gives an idea of the degree of correlation, The narrower theiand the greater is the
of correlation
xr correlation befween the variables.
When the points do not form a band, i.e, , they are scattered in all directions it indicates that there
ln the case of perfect correlation, the points will be on a straight line. The method is mainly used when we are interested in finding out whether there is correlation only in getting a rough idea about its nature and degree. It does not give us any measure of lation. The following diagrams illustrate the various cases.
tcsitive Correlation
(a)
o
Perfect Negative Correlation (b\
H
o
igh Degree Positive Correlation
(c)
Degree
Negative Correlation '.d)
Low Degree p;sitive Correlation
(e)
Low Degree Negative Correlation
No Correlation
(f)
(s)
Fig. 17.02
The line of best
fit
ft
ctten there is not a straight line which passes through all the points but we can still draw the hne which comes closest to finding all the points. We can estimate the position of the line by TL.s hne shows the general trend of the relationship betwsen the two sets of data. It may or may through any ofthe data points.
'
r
Thegradientoftheline TltecJoserthepointsaretothislineofbest.frtthehigherthecotrelation. or horizontal line of best fit means that the variables are not oor imporfant except that a vertical
llere
are some
tpba/
examp/es and freirit/eqre/a/ion
.
H
igh posrtive correlation
Low correlationnot a strono relationshio
posiiive
No correlation
High negative
corre lation
Low negairr
c o
rre la
tr:'
Fig' l7'03
strong re,a
discussed later in this chapter.
Note. The method to find lguation of line of best fit, also called line of regression.
Ex. 1. Construct a straight line which approxinfates the following data, ie', the line of
l34689lll'r t244578)
Sol. Plot the points (1, l), (3, 2),(4,4), (6,4), (8,5), (9' 7)' (11, 8) and (14' 9) on
coordinate system as shown
a
in Fig.
17.04
A straight line approximating the data is drawn fieehand in the figure'
10
o
A
4
2
P
4 6 I
10
12
Fig. 17.04
17,01. Karl Pearson's coefficient of correlation
A mathematical method for measuring the degree of correlation between two variables \',*m was suggested by Karl Pearson. It is known as Pearsonian coefficient of correlation and is by the symbol r or p (X, D*. The formula for finding the correlation coefficient is based cq &
concbpt of covariance which we define below
:
Definition. If the variable X takes the value of xr, xr, x3, .... xn and another variable I take* ur valuesyr,yr,13,......vrthenthecovariancebetweenthetwovariablesxandyiswrittenasco\ '.i
lNlr
CoVariance
and is defined as
Cov(X,Y):$,X)(y,V)*(rtX ,[
and
n
where
/
de
tc.
where
and IT t nn covffi n = +It" h :,t') orwrinenmoresimply,Cov(x, dr *i T,dr=li V
n Md Cov(X,n: lI(*, x)(.y; n=t nH i=l
.
note the arithmetic means of the fwo series' i. e. , _ .tl + x2 +... + .rtr : \ + lz I...+ ln
n: I*
t
Q'ormulrl
where
dr=xii,d.r=Y.
r
*
The symbol p (pronounced 'ro') is a lefter ofthe Greek alphabet corresponding to the English
ron
Jfwe divide the covariance by the product ofthe individual standard deviations, the quotient so ftred is called the correlation coedcient. As it was suooearcrt h' tror rr^^^_ so : , ., . rt suggested by Karl Pearson, _pearson,s coefficient
ch177
Formula
p
"r#;].]:'T;"lt
Cov
it is cailed
6,
n:
o"o,,
(x,Y) *
\:
2@,)2
n
, therefore,
?!&
o"o.u
Ddxd
v
n,6x,gy
zd
;
and o,=
zd,d I
P6,n orr =
Zd'dY,
P6,norr:
n.ox.or
=
are rhe deviations takenfrom the actuql mean.
d, : xi x,dt = li  !
Srnce the above formula is based
es
on
Edrdr,
lfr = 0, there is no correlation. {lso as a general rule r from * 0'00 to + 0'20 denotes indifferent or negrigibre rerationship; r from + 0'20 to * 0.40 denotes row correration or sright rerationship; e from + 0.40 to + g.7g denotes substantiai or martieO relationship; . r from * 0.70 to + t.00 denotes high 1o n.rylrigl, relationship. and somlwhat t.niuijui, and can onri ue accepred
lf r: l, there is perfect positive correlation. lI r:  l, there is perfect negative correlation.
h will be seen rater on that the coefficient of correration isee Q. 30 in Exercis e t7 (a) and Arr. l7.l0l
ofxand f from their means, this methoi
i.e., product of the deviations of the observed
is arso calred product moment method. r is such that I < r < l.

}ff',fr?i:'rxf,:;::"0
as a generar guide
,?i; f;#?i,;rt*ilfir";:
Sol. We have
= 8 and rr'2 = e0, find rhe number oritems, (x andy are
Lxv
no ro
.
s
_
];
t
2n
,\y)2 n'olol
0.25:
nx90x64
14400 _
i
0,20),'
e
0.25
x2n: 5 :+ r:
"(2e)"a+
10.
]..'o,,=}.=ru] n I nl
L_l
Text
"';r]lrl,l",li"J:"scores
rabre gives the tesr scores and sares bv nine saresmen during rast one
000'Rupees) : JI 36 48 37 50 compute the Karr pearson's coefficient of correration
Sales (in
:
14 lg 24 2l 26 22 lS 20 45 33 41
lg
39
and interpret the resurt.
(r.s.c.
1993)
t4 t9
24
3l
36 48 37 50
l
4 I
6
2
6
4
8
9
54 4
32
3
36
il
t6
6d
I
2t
26
22
l6
I
36
3
IO
5
9
4s
33
60
l5
20
4t
39
5
0
*7
I
l0
35
rm
25
4
25 0 I
49
I I
l9
tr:180
t
0
I
l
r:
which shows
a
f=ro, t =ff: +o \d*d,, :+: 193 =:!2l =mrii zaj
,l>ai
:
o'e4. approx.
very high relationship. the correlation coefficient between the corresponding varues of ,n t3tl;i;r,l*Tpute
xand
X
Y
t
4
5
6
8
8
l8
t2
l0
7
5
(/.J,C. 2007 n
2
l8
4
2
4
5
l6
4
I
l2 l0
8
8 2
64 4
0.
32
6
8
l
0
2
4
0
0
0 4
25
7
5
n
36
5
2 3 5
4
9
0
6
25 67
2s
x =f=0,
Hence, the
F=99=19
,=&=.:!L=67 @=i6olmffi=#=o'e2 variable,yand
Iare highly negatively
corcelated.
17'08' second formura tor p(x, using directty the value r,
We know that
r)
ano'y,.y
(without using deviations from sv'retrur means, i.e,
b
2(x,l)z = rlxl
2x,.7+12;= zx._zr.zx,+n.i2
ch179
xl,  zz.z*, * n.(ot)'  r,2  gil t "' 'i n
n2
n
2(y, y)'
I(x,  x) (y,  V)
,i
,,,2 .(LY,)2 n
Z(xiy,

xiV
b,!,t(rfi)I
zx.v. _
n

yiT + x
(ny) +
t)=bili  !b, F.Ly, + nry ni ! =bi!, nx,
(xx,) (xy,)
p(x,
t) :
2lfr, I)(yiy)
y_ ., Gr,) (ry,) ^,r,_ _._;_
J*:'?1 1,,:
p(X, Y) or
ty
/
IJX!  ' =
Lr.Ev
'x' and y' stand for the values of items in X and
I
series.
.
Calculate the coefficient of corretation between x and y for the following data.
Since the given values are small we can use the
formula
3
l0
I 5
I
4 I
tle
J
2
9 4
8
7
;
6
8l I r00 I
49 I 16 64 l,q
qs
16 I 25t4 36 I
e l2s
t
100
20
7
t5
4
t0
Sr
ffi
321
).rxy
Lr.n
2Y
54
28 64
63

302.s
18.5
:o
60
J(:r.s  302.5) (3s5  302.5)
: :==$11 w'"4' /tz.s sz.s 82.5
1
18.5
x
were made : It:30, E/: 5, 2x2 =,670,2y2 = 2t5,Zxy = :Rq. On subsequent verification it was that the pair (x = lr,y = 4) was copied ,,n.ongy, the correct values being (x = r0,y: 14 the correct value of correlation coeflicient.
Ex' 5' In order to find the correlation coefficient between two variables x and y pairs of observations, the following calculations
fror
Sof. Con'ected Xr= given, )xincorrectvalue+ correctvalue:30_ l1 Similarly, corrected Xy :^5 4 + 14 : 15, corrected Zxz : Arc_ (ll)2 + corrected z1?:ZSS @)2 +(t4)2:Ces, corrected
The correct value ofcorrelation coefficient is given by
+ l0:29
eO)2
:
Aqg,
Zxy:334 llx4+ l0 x 14:4j0
LXy Xx. Xv
,rYll,r
+_
17 .0 9
OrO
_29 xtS
393.75
/L*'+llou,g]
#E=0.7747.
Jiivsz,4462s
and b respectively for thJ variates
. Third formura for p (X, r). (when the deviafi ons are taken from an as sumed u 4u 4Jrqtltc(I ll then neither Itll""::5:3i: ?11/,,r are large or involve fractions,;'i?iil'i 'lffiof rhe rwo formulas ffi il; :;;ffi :n",T;"H:H'fd :i_?il:ed ?i:::::?: ^: : :, * simplifi i by considering the deviatio", ;;;;r;;; ih, ;;ffi ;;; and y,
;
* iii;.' ' ''
,
from assumed
rf ,, = tt ut 
o ,, ^', li b T'', =T.
z*,y, Lx.xy'
i.e., x, hu, + a,
),i: fui+
b, then on substituting in the
p6,n=
s24i
1Ex,)2
n
,iry
p(x,
and simplifying, we obtain the following
:
nn
2
:
of
Using short notations u andvin place .L
I
uandfv, respectively, we may write this
in a simpler and convenient form u, und", , '=' 2uu
r=
_(2u)(2v)
n
:
A and B being the assumed meanc number of pairs of observation. This formula k is used when actual means are fractions.
Where
u:XAo, XA v:Y*BorYB h
ch171
above formula can be written as
1
ln some books you will find the notation d, and'd, for u and v respectively. Using this
p6.norr(X,Y)
=
under Edrd y 
Ld .Ld
,oz (Ed,)2 ^n
tneans
*jq/
x
and y
,
book we have used the notation d, and drrespectively for the deviation from
the arithmetic means right in the beginning to know whether the means are whole
a fractions so that you may apply the formula rccordingly. tlr given values ars small, then apply formula (ll) hvolving direct values only'
retation of correlation coefficient
The coetficient of correlation shall always be between l and +l ' flren r is +1, there is perfect correlatioh between the variables' llren r is *1, there is perfect negative correlation between the variables.
Shen r is between 0.7 to 0.999, there is a high degree of correlation befween the variables. The correlation shallbe positive if the sign of r is plus (+) and negative if the sign of r is ninus (). $tren r is between 0.5 to 0.699, there is a moderate degree of correlation befween the
rariables.
Sten r is less than 0.5 there is a low degree of correlation between the variables' tr\tren r is zero, there is no conelation between the variables' a Calculate Karl Pearson's correlation coefficient between the marks in English anC by l0 students.
_ ^
l0
25
184
lo
ls.4o,
t=ffi=rt.z
uv
l2
22
8
7
64
6
4
a J
a
36
48
28
49
25
t6
4
9 0 0
I
t3
25
l6 l5
18
5
7
l0
21
0
0 6 35
49
T'
4'
_1
t6
49
36
0 0
ll
t2
25
l8
t7
23
6
7
J
t
5
49
9
25 36
I
2l
20
24
6
l8
z
a
l7
2
4
l
122
We take
l8
as the assumed mean
for both the series
LUV ll
..
Ia Iy
I
llr", (!4' ^llzr' n )1"'  , )I \t
G')'
rl
_
./{:so
V
p2 _
!/2 l0
Ex.
7. Given the following pairs of values of the variable X
@ l''"  io t"" lo JI l,ro Q'l :4 =:2re= 2t0.8: ,1348.4xt27.6 =t?,1.?_=o.s?4r
J
Jqqqssji
and y
:
Sol. Calculate
p6, n
yourself. Ans. _1.
(6, r2).
To draw fhe scafter diagram, plt the points (2, 16), (4, l4), (14,4) and (16,2). Fig. r7,05 showi the required scatte, diagram.
''
(8, r0), (10, 8).
(
16
14 12
10
I
6 4
246810121416
Fig. 17.05
The points in the scatter diagram proceed in a line from top to the bottom which indic and )'are in perfect negative correlation.
(a) calculate the value of the correlation coefficient for the following date : (g, j3)t (7,73), (8,83), (e, e3), (r0, 10.3), (il. ll,rt1r,,,r1,,lrl,,(1 2):9,,.!t):(r,.1.]: (t2,
Ex'
8'
tt), (lJ, ll.s), (l4,lz), (15, t2.s),iio, lr), (rz,'ii.s), fi's, iol,irn", (b)
Draw the scatter diagram.
Comment on the resutt.
i;;:rdl
(c)
ch1713
Sol. Let the assumed mean for a=20
the first
variateXbe l0
and that for the second variate be I 3.
uv
I
l3
23 33 43
53 63 73 83 93 103
2
3
.l
_i
6
1
*4
J
1
9 8 7 6 5
0
8l
64
0 100
0
l0
20 30 40 50 60 70
80
80
49 36
25
I
400 900 600

140
l6
9
2500
3600
I
9
.4
I
180 20a 200 180
4900 6400
8100 6.25 4.00 2.25
1.00
t0
l
0
I

140
lt
t2
90
80
0
0
I
10,5
ll
I 1.5
2.5
L
a
2
J
2.5
l3 t.t l5 t6
4 9
*1.5
4
*4.5
t2
12.5
4
5
l
0.5
0 0.5
I
l6
25
+
2.5
0
l3
i 3.5
0.2s
0.00 0.2s
6
7
8
l7
r8
z6 4g
64
14
3.5
8
t9
t4.5 I5
:0
I
l0 tra = l0
t.00
2.25
1.5 2
8l
100
l3.s
2A
4.jtJ
Zv = 447.5
\uz :670
Zvu:
p6,n
=
__E,,_,lL),)(r")
/[r,''+i]
r r72.5

{r,'ry,
z+:ss"tmna6
_
t3e6.2s

1172.5
,l[uro qo]' /) J( ,rrrr.rr (++z's)') V\ zo Il""'"" n )
: __119Ji2r_.* t3s6.2s _ =J0o.s Jts508i38
350&3s42
:
0.3e7e8
It)

".06.
Ploning the points (1, l3), (2, 23),.....(20,15) we obtain the scatter diagram as shown in
I.11.
Spearman's rank correlation coefficient
I
Sometimes such problems are faced that it is possible to arrange the various items of a series in the quantitative measurement of their values is difficult for example, it is possible ':: a class teacher to arrange his students in ascending or descending ;order ofintelligence, even .:ugh intelligence cannot be measured quantitatively. No doubt, the quantitative study about the ::lligence of students can be made by holding an examination and assigningthem marks, but this .'thod can never be said to be infallible. There are many such attributes which are incapable of :rntitative measurements, tbr example, honesty, character, moralify, etc. In such ca.ses it is possible to rank the individual in some order. The most intelligent individual .:y be given rank l. next rank 2 and so on.
t:al order but
lre
The conelation coeificient between two series of ranks is called'Rank Correlation Coefficient'. formula for coefficient of rank correlation as gjven by Edward Spearman is
R=l
or R=
tu?o'
n n
r
is the nurnber
'rhere D is the difference between the corresponding ranks of the two series and rdividuals in cach series.
of
Note l. lnstead of assigning ranks I , 2, 3, .. . from highest to lowest, we can also assign these :anks flom the lowest to highest, i.e., rank I to the least intelligent. rank 2 to the next more intelligent, rert rank J, and so on.
= 0.
2. Remember that the algebraic sum of the rank differences is always 0, i.e." L D is always If it is not so, then some mistake has been commifted at the time of assigning ranks. Note3. 1'heinterpretationsofthevaluesof Rarethesameasgivenon page6 inArt. 17.05.
Note
17,12. Solved examples
We may come across two types of problems
:
(a)
Ll/hen ranlcs are
given
(b)
I(hen ranks ore not given.
lYpe l. Wken rsnks are given Working rulc Step l. Compute D, the dffirence of the ranl<s. Step 2. Compute D2 and get the sumZD2. Step 3. Substitute the values in theformula.
related ? M athematics. To what extent the knowledge of the students in the two subjects is Statistics
Sol.
:
Ex. 10. Following are the ranks obtained by
l0
students in two subjects, Statistics
ld
Mathematics:
t234 2415
5678910 397106E
I  lfiatistics\x)l Mathematics(tt)l
non*".f
a:t"v)l D'lI lantof ..1
I r
I:Iq I
I
I
z
I 'e I I l; s I I
I ;1 iI s I t I z
I s t
'o e s
r I I I I
':t;@)
=+0.76
oLt)
6x40 'rrrtu
=ffi:,o:,r
I n I I I I ro I I I
Caution. When 1}pe
I 3  + I o  e I z lo +


I I
the ranks are already given as in the above example, rJo not commlt tne mls&lE
I ,:  s  .z I + I  l>o'z:aol I r

I

fassigning new ranks.
2. When runks are not given, gr Wt"t.n no ranks are given, but actual data are given, then we should assign ranks' We can next to the highest (lowest) as 2 and folh ranks by taking the highest as I or the lowest value as l, anks the same procedure for both the variables. he sz
Ex.
3l and coefficient of correlation of rarb Compute their ranks in the two subjects (/"S'C' 2002' CBJ{Interpret the result.)
Sol.
Marks in Physics: Marks in Mathematics:
ll.
The marks obtained by the students in Physics and in Mathematics are as
35 30
23 33
47 45
l7 23
l0 I
43 49
I 12
6 4
follor28
x
35 23
v
30
33
Ranksinx:R,
Ranks
iny ; R2
D:R,R, 2
2
D'
4
4
1
47
45
23
8
l
0
l7
0
I
l0
43
l
I
49
I
I 0
I
6
l2
4
I
0 0
28
3l
0
zD2:
12
r: t;@4:'t(slD'720'
l010 :u'v'
]
a very high relationshio This means that the students who are gor good ics are good in Mathematics also uir"_u.rri
hi gh value of
r indicates
ch17_i 721
l2 , Ten competitors in a beauty contest are 'anked by three jur lges in the followin wing
5 8
7 r
*d
8 6
I
correlation coefficient to discuss which pa ir ofjudges have tl re nearest approach tastes in beauty.
Second
4
8
6 9 l0
t0
l0
9
732 321 234
l
2
lr')
t
Third
6
7
8
Judge (R2) Judge (R.
Dtz:
RrRz
Drz: RrR,
Dzz: Rz*R.'
_1
I
D,,,
9 9 9
D,l
25
D,,,
4
I
7
6
5
3 3 3
')
5 2 4
7
5
4
16
I
5
l
5
,l
25 0
I
4
49
4
9
l0
9 2
3
l0
J
3
0 4
I
0
4
I
l
I
l6
9 0
t6 t6
I
I
I I
2
I
5
4
I
t
.)
0
_I
_J
I
l6
I I
25
0 4 2D?t
9
,DI,
:74
l I
I
'rz
=
l n(n. _t) =, 10"99:0'55
6x74
= 0.05 6DD:
u'!f'
:
EDlt
156
:44
 I
'J
tl
u14 6x156 ',::r n(n._t) =, m. 99
6x44 :0.73. m 99 " ! r:, is maximum, we conclude that the pair of second and third judges has the nearest )m beautv.
":
=lffi='
h fn" coefficient ofrank correlation ol ' ma rks_ obtained by I 0 students in English F was found to be 0.S. It was later disr :overed that the difference
Stained
and in rant<s in ttre two by one of the students was wrr rngly taken as 3 instead of 7. Find th. of rank correlation. (.5.C. 200e Typel
.;;;;;;
r: .  62D2 Substitutingth  ;@ ) 0.5=l_ 6>D2 _6>D, l0(100_r)  900
:0.5x990
s
given values, we get
_
=
I

0.5
:
0.5 =+ 6 2,8 :0.5 x 990
g2.5
+
ZD2
6
_
=
92.5 ... correcte( lvalue of
ZD2:

32
+ 72:
122.5.
ch1722
LS.C. Mathematics

rhe correct value of
Ex. 14. Find out rank correlation from the following data:
S.N.
r:  if ilf'f*i =t  #r* :  .74:0.26 approx. ' '
ll'
Rank differences
:
2
4
l
+3
+2
0

+3 I D:
+3
Sol. First we find the unknown rank difference by using the fact that :2. : value of the unknown rank difference + I I 9
Now we have
0. This gire:
S.N.
:
D: D2:
l2 ,24 416
34s67 r 3 2 19404
6r160
8
0
2
"3
9
9 l0 3*2 94
N:
l0
t D:0
L D2:60
. "
lf
=t_g=*=*=0.63e. r_ t_ 6Z^D' =t_ t ' 10(102l) 990 990 ll ngzl)
(l'e'' tie in either or both series two or more individuals (or items) have the same score
17.13, Correlation for tied ranks
then the Spearman's rank correlation coefficient formula,
' R: l  *fails n(n'l)
to give the corre
fr coefficient and a correction or modification in the formula becomes necessary because this is given to : is based on the supposition that ranks of various items are different and that no rank
than one item.
The problem is solved by assigning a common rank to each of the individuals who are in This common rank is the average of the ranks of these individuals' For example, we have the series
e
90
55 78 72 90 and 85 and the next value 80 appears four Rank I and rank 2 are assigned to two values,
85
80 80
= a.5
80
80
69
69
then the r.unt
t!ry yf I
will
be assigned to each of the value 80 at the four piaces and th;
ass:i:nt lower value than 80, viz.,7Lwould be assigned the rank 7 because 6 ranks have already been rank 9 and 10, then fhe common rank assigru Now, we see that the value 69 is repeated twice at
each value would
be
=9.5 and the next value 55 will have the rank I I and the value 5:
rank 12' Thus' we
Series
(.Y)
:
have 85 90 a{
2
Rankassigned:
rank
Ro Ro : 80 80 80 80 78 72 4.5 4.5 4.5 4.5 7 8
l.
69 69 55
n
9'5 9:5 I I 1
:
j'
mmr
1ur
Note: We could have started by assiging ranks from lowest value,
e', rank I to 54, rank 2 to
5
3t4 'f
t.z.^,o.L^3,5 to each ol the values 69, rank 5 Io 72, rank 6 to 78, then rank
7+8+9rlo : : bee'I
each of the values 80, rank under: : 90 Series QQ
ll
to 85 and lastly tank12 to 90' Then, the ranks wotrld have
80 8.5
ni
Rank assigned
:
12
85 80 80 8.5 8.5 II
80 78 8.56s
72
69
3.5
69 3.5
55 2
n
factor is added to the Spearman's rank correlation formula
factor : After assigning common rank to items with
ch1723
rept :ated values an adjustment,t or
as eries there are rn items whose ranks are common, then for
c<
)rectton
.l
rre peating value in both the series. The modified formula, the n is given b
lf
(m3
added m)isadd
R
:r
o(o'i
:,
3d
the in GreatBr itain in the years
resr nlt.
fr1, tn2t nt3;...... are the numbers of times a value is repeat .ed.
15. Tie figures below give the numbers of passenger_c gr( ,pair, and the number of persons killed in train accidents arrying veJ ricles under
E. I rind the correlation between these figures and commenl ; on the
,s'm ay be used.
The method
IH.s.c.l
br hls
x
Vehicles under
v
No.
of
repair (1A00's) )1
2.s 2.7
2,5 2.5
Rank
Rank
Persons 30
'r'7
ofx
7
D
a
ofy
9
D2
h,, E04o
4 0
I 64
9
l0
7
l0
6 2
7
0 I
8
50 76 37
14
f*' lEa
f*t
i
l0 l0
5 3 1.5 1.5
J
2.7
3.5
lt44
ll
8 3
*4
J
t6
9
34
75
l9r5
lv+6
4.9 5.4 5.4
3.8
0
3.5
.5
0
12.25
60
121 74
llAT
1948
5
1
0.25 0
4.0
4.0
0
2o2 rsenes 5'4occurstwice. Therankof 5.4k
:
t 15.5
+: 2
^
I.5. Thenextrower varue4.grrurrign.d
:T,andnext2,Soccursthrice.
kas3. Similarly,2.Toccursthrice, Therankof2.T
nN
oI
. 9 + 10 + It l.) rs ______: 'J
10.
=! #,r*t
m)tothevalue
dng to these ranks the coefficients will have to be corrected by adding
l. In respect of the x series, this addition will
be
i,rr'21* $o' 3)+ #(33 21: t2!.+Z=a.5
occurs fwice, 2.7 occurs thrice and 2.5 occurs thrice.
(Art, 17.13)
lf
also.
there had been repeated values inyseries also, we would have added the correction
fq
.. rrt
=
I6x120
6l(>Dt)+4.51_, ,

6il15.5+4.51 I lr _ ll
_r l
l33t _t I =l
To _, ffi
=

0.545 = 0.455
Since the value 0.455 or r lies between 0.40 and 0.70, therefore, it signifies substantial or
relationship. It means that ordinarily, the higher the number of passengercarrying vehickr repair, the higher is the number of persons killed in train accidents in Britain. But it does not
follow that one is the cause of the other. There can be many other causes, for example, wear
etc.
Ex. 16. Find out the rank correlation coefficient between the heights of fathers rJ from the following data: Height of fathers in inches: 65 66 67 67 68 69 70 Height of sons in inches : 67 68 65 68 72 72 69 7t
Sol. x
v
67 68
65
Rank
ofx
Rank
ofy
D: R, R,
I 1.5
D2
(Rr)
65
8 7
(Rz)
',
5.5
8
I
66 67 67
68
2.25 6.25 0 6.25
5,5 5.5
1.5
0
68 72
72
5.5
1.5
4
J
2.5
69 70
72
.
1.5
r.5
))\
4
69
2
I
4
3
z
1
7l
4
L
In the xseries, 67 occurs twice and its rank In the yseries, 72 occurs twice and its rank
68 occurs twice and its rank
D2
=26
i, 516 : S.S 2 ''
is
I+
2
2
: I'''s
ir
f
=
S.S
The correction faclor for.rseries
The correction factor for yseries
:
:
' =*t  ^ = t t'=o's
t2
.'.
*.rrt zl + lff
0.5
Total correction factor
+l
:

2) = 0.5 + 0.5 =
I
1.5
K = I  , r\ n@z l\ 
6lLOz+1.5]_, 6(26+1.5) _,  t::r8(8' l)
6x27.5
504

so{:165
504 = 1?? = 0.67261s=s.673 504
*j
ch1725
Ex. 17. Find the Spearrnan's rank coefficient of correlation from the fotlowing data
:
Yz13
Sol.
33 13
X
ztt)
I
6
24
l6 l5
16
65 25 209616
16
I
57
We will solve this question by assigning ranks from Iowest to highest, i.e., rank lest value, rank 2 to the next value and so on. Series A
to the
Series )/
I
Rank Differences
Squares D2
Rank  X
Rank

)'
D=RrR,
+ 2.5 + 0.5
_J
(Rr)
48
33
8
(Rz)
6
1 I J J
l3 l3
LI
5.5 5.5
6.25 0.25
9
40
9
1A
t0
2.s
7
6
l6 t6
65
l5
4
2A
 1.5 4
+2
2.25
t6
4
I I
I
9 4
l0
5
25
9 6
+l +l
+ 0.5
l6
57
J
?5
0.25 I
9
l9
I
+l
t0
is repeated 3 tirnes
n:
in seriesXanO
l0
Ip:o
Lo2:ql
,o
$t:t 3) will be added to L D2.ln seriesy, l3 is repeated #rrt 2) will be added twice to I
*;+$<m',
D2.
and 6 is also repeated 2 times. So the factor
R: t*
LD'z+iQn:t
: t_
4r+if(33
:l+$tz3 2)+ie3 z)
, ' ' 6[41+2+0.5+0.5]_, 6x44 _990_264  990 toooto
ffi:0.733.
EXERCTSE t7 (b)
Find Rank Correlation Coefficient by Spearman's formula in the fohowing questions.
Typc
l.
(Based on the formula
f
=I
 p4, n(n" 60 78
l\
:
l.
Thc rnarks obtaincd by nine students in Physics and Mathematics are given below
Physics
Mathematics
48 62
72 65
62 70
56 38
40 54
39 60
52 32
30
3t
Calculate Spearman's coefficient correlation and interpret the result.