You are on page 1of 18

GORRELATION

Correlation defined
are so related that a change in one is one is accompanied by ompanied by a change in the other in such a way thal (i) an increase in hy a clecrease or increase in lhe olher' or rlecrease in lhe other or (ii) decrease in one

t

increase
d

magnitude of the change in the the greater the magnilude of rhe change in one, the grealer lhe then the variables are said lo be correlated

For example

(i) an increase in the intensify of cold results in greater sale of woollen clothes' (ii) an increase in the price of a commodity results in a decrease in its demand' (iii ) an increase in the heights of children is accompanied by an increase in thelr u'eights'
(iv)
a decrease

in the price of

a

commodity is accompanied by an increase in its demand.

Positive and negative correlation
the variables Whether correlation is positive or negative would depend upon the direction in which in the same direction, i.e., when an increase or decrease in moving. If both the vaiiables move correlation between the corresf,onds to an increase or decrease respectively in the other, then the or direct. variables is said to be positive to a lf both the variables move in opposite directions, i.e,, when an increase in one corresponds in the other, then the correlation ;ase in the other or a decrease in one conesponds to an increase

the two variables is said to be negative or inverse.

rations
correlation
Height (cm) Weight (kg)
: :

160 58

t62
60
18

163

t66
65

6t l5
120

110 68
t2 r80

175 70

five correlation Price (Rs per unit)
Demand (units)

:20 : 80

t4
150

10

100

200

Degree of correlation
Correlation may be perfect or imperfect. When the changes in the corresponding values of two perfect' It $les are proportion;I, directly or inversely, the correlation between them is said to be variable is accompanied by a .t poiitiu. if the increase (or decrease) in the values of one ional increase (or decrease) in the values of aSecond variable, e.g.,the correlation between

mferencesofcirclesandtheirradiiisperfectpositive. Ifthereverseisthecase,i.e.'ifthe

ihe correlation between the iwo variables is said to be perfect negative, e.g.' if a rectangle itant area, the correlation between the lengths of its sides is perfect negative,
,

(or decrease) in one is accompanied by a proportional decrease (or increase)^in the other

Such perfect (positive or negative) correlations are met with only in exact s; Mathematics, Physics, Chemistry, etc., but not in social and economic phenomenon tr phenomenon, the changes in one variable are not generally proportional to the changes ' In this case, the correlation between them, if it exists, is said to be imperfect positive cr depending upon its nature. Imperfect correlation again may be high, moderate or low. Tte imperfect correlation lies between perfect correlation and no correlation. Thus, we ma1 positive correlation, e.g., between incomes and standard of living or we may have negative e.g., between supply and price of commodity. Similarly, we have situation where conelation may be moderate (or low), negative or

Perfect positive correlation

All the plotted points lie on a st. line rising from the
lower left hand corner to the upper right hand corner.
Y

Perfect negative correlatiou All the plotted points lie on I st. line falling ffom the upper left hand corner to the lo*e:

right hand corner.

o

High degree positive
correlation The plotted points (x,, y;),

High degree negative
correlation If the plotted points (x,, y,), i : 1,2,3, ..., fr fall in a narrow band from upper
left hand comer to the lower

i : 1,2,3, ,.., n fall in a

narrow band and the points are rising from lower left hand corner to the upper

right hand

corner,

Fig. 17.01

left hand corner.

n,A4 (i) (ii) (iii)

Methods of studying correlation
:

The various methods to determine whether two variables are conelated or not are

Scatter diogram melhod

Karl Pesrson's cofficient of conelation Rsnk melhod (Spearman's and Kendall's coefficient) Out of the above, only the Karl Pearson's and Spearman's methods are in the syllabus -'rn

Gorrelation

ch17-5

17.05. Scatter diagram nrmally, an indpendent variable or time is plotted on the horizontal axis..This is also called
Scatter diagram is a graphic device for finding correlation between two variables. One variable
a

as the dependent variable or one to be predicted is lbou'n on the vertical axis. The movements of the pairs of these variables shown by dots on the graph rreal whether they move in the same or the opposite direction.

pedicting variable. The other variable known

It the points form a band of some width, it will indicate imperfect correlation between the two rbles' The direction of the band indicates the nature of correlation. If the band slopes upward, it ates positive correlation and if it slopes downward then it indicates negative conetation. t'ire r of the band gives an idea of the degree of correlation, The narrower theiand the greater is the
of correlation

xr correlation befween the variables.

When the points do not form a band, i.e, , they are scattered in all directions it indicates that there

ln the case of perfect correlation, the points will be on a straight line. The method is mainly used when we are interested in finding out whether there is correlation only in getting a rough idea about its nature and degree. It does not give us any measure of lation. The following diagrams illustrate the various cases.

tcsitive Correlation
(a)

o
Perfect Negative Correlation (b\
H

o
igh Degree Positive Correlation
(c)

Degree

Negative Correlation '.d)

Low Degree p;sitive Correlation
(e)

Low Degree Negative Correlation

No Correlation

(f)

(s)

Fig. 17.02

The line of best

fit

ft

ctten there is not a straight line which passes through all the points but we can still draw the hne which comes closest to finding all the points. We can estimate the position of the line by TL.s hne shows the general trend of the relationship betwsen the two sets of data. It may or may through any ofthe data points.

'

r

Thegradientoftheline TltecJoserthepointsaretothislineofbest.frtthehigherthecotrelation. or horizontal line of best fit means that the variables are not oor imporfant except that a vertical

llere

are some

tpba/

examp/es and freirit/eqre/a/ion

.

H

igh posrtive correlation

Low correlation-not a strono relationshio

posiiive

No correlation

High negative
corre lation

Low negairr
c o

rre la

tr:'

Fig' l7'03

strong re,a

discussed later in this chapter.

Note. The method to find lguation of line of best fit, also called line of regression.
Ex. 1. Construct a straight line which approxinfates the following data, ie', the line of

l34689lll'r t244578)
Sol. Plot the points (1, l), (3, 2),(4,4), (6,4), (8,5), (9' 7)' (11, 8) and (14' 9) on
coordinate system as shown
a

in Fig.

17.04

A straight line approximating the data is drawn fieehand in the figure'

10
o
A

4
2
P

4 6 I

10

12

Fig. 17.04

17,01. Karl Pearson's coefficient of correlation
A mathematical method for measuring the degree of correlation between two variables -\',*m was suggested by Karl Pearson. It is known as Pearsonian coefficient of correlation and is by the symbol r or p (X, D*. The formula for finding the correlation coefficient is based cq &
concbpt of co-variance which we define below
:

Definition. If the variable X takes the value of xr, xr, x3, .... xn and another variable I take* ur valuesyr,yr,13,......vrthenthecovariancebetweenthetwovariablesxandyiswrittenasco\ '.i
lNlr

Co-Variance

and is defined as

Cov(X,Y):-$,-X)(y,-V)*(rt-X ,[
and

n

where

/

de

tc.
where

and IT -t -nn covffi n = +It"- h :,-t') orwrinenmoresimply,Cov(x, dr-- *i -T,dr=li -V
-n M-d Cov(X,n: lI(*, -x)(.y; -n=t nH i=l
-.

note the arithmetic means of the fwo series' i. e. , _ .tl + x2 +... + .rtr : \ + lz I-...+ ln

n: I*

t

Q'ormulrl

where

dr=xi-i,d.r=Y.
r

*

The symbol p (pronounced 'ro') is a lefter ofthe Greek alphabet corresponding to the English

ron

Jfwe divide the covariance by the product ofthe individual standard deviations, the quotient so ftred is called the correlation coedcient. As it was suooearcrt h' tro-r rr-^-^^_ so :- , ., . rt suggested by Karl Pearson, _pearson,s coefficient

ch17-7

Formula
p

"r#;]|.]|:'T;"lt
Cov

it is cailed

6,

n:

o"o,,

(x,Y) *
\:
2@,)2
n
, therefore,

?!&
o"o.u

Ddxd

v

n,6x,gy

zd
;

and o,=

zd,d I

P6,n orr =
Zd'dY,

P6,norr:

n.ox.or

=
are rhe deviations takenfrom the actuql mean.

d, : xi -x,dt = li - !
Srnce the above formula is based
es

on

Edrdr,

lfr = 0, there is no correlation. -{lso as a general rule r from * 0'00 to + 0'20 denotes indifferent or negrigibre rerationship; r from + 0'20 to * 0.40 denotes row correration or sright rerationship; e from + 0.40 to + g.7g denotes substantiai or martieO relationship; . r from * 0.70 to + t.00 denotes high 1o n.rylrigl, relationship. and somlwhat t.niuijui, and can onri ue accepred

lf r: l, there is perfect positive correlation. lI r: - l, there is perfect negative correlation.

h will be seen rater on that the coefficient of correration isee Q. 30 in Exercis e t7 (a) and Arr. l7.l0l

ofxand f from their means, this methoi

i.e., product of the deviations of the observed

is arso calred product moment method. r is such that I < r < l.

-

}ff',fr?i:'rxf,:;::-"0

as a generar guide

,?i; f;#?i,;rt*ilfir"-;:
Sol. We have

= 8 and rr'2 = e0, find rhe number oritems, (x andy are

Lxv
no ro

.
s

_
];

t
2n

,\y)2 n'olol

0.25:

nx90x64

14400 _

i

0,20),'

e

0.25

x2n: 5 :+ r:

"(2e)"a+
10.

]-..'o,,=}.=ru] n I nl
L_l

Text

"';r]lrl,l",li"J:"scores

rabre gives the tesr scores and sares bv nine saresmen during rast one

000'Rupees) : JI 36 48 37 50 compute the Karr pearson's coefficient of correration
Sales (in

:

14 lg 24 2l 26 22 lS 20 45 33 41

lg
39

and interpret the resurt.

(r.s.c.

1993)

t4 t9
24

3l
36 48 37 50

-l
4 I
6
2

-6

4
8

-9

54 4
32
3

36

il
t6
6d

I

2t
26
22

l6
I
36

-3
IO
5

9

4s
33

60

l5
20

4t
39

-5
0

*7
I

l0
35

rm
25

4
25 0 I

49
I I

l9

tr:180

-t

0
I

-l

r:
which shows
a

f=ro, t =ff: +o \d*d,, --:+: 193 =-:!2l-- =mrii zaj
,l>ai

:

o'e4. approx.

very high relationship. the correlation coefficient between the corresponding varues of ,n- t3tl;i;r,l*Tpute

xand

X
Y

t

4

5

6
8

8

l8

t2

l0

7

5

(/.J,C. 2007 n

2

l8

4
-2

4
5

l6
4
I

l2 l0
8

8 2

64 4
0.

-32

6
8

-l
0
2

4
0

0

0 4
25

7
5

n
36

5

-2 -3 -5

4
9

0

-6
-25 -67

2s

x =f=0,
Hence, the

F=99=19

,=&=-.:!L=67 @=-i6olm--ffi=#=-o'e2 variable,yand
Iare highly negatively
corcelated.

17'08' second formura tor p(x, using directty the value r,
We know that

r)

ano'y,.y

(without using deviations from sv'retrur means, i.e,

b

2(x,-l)z = rlxl

-2x,.7+12;= zx.-_zr.zx,+n.i2

ch17-9

xl, - zz-.z*, * n.(ot)' - r,2 - gil ----t "' -'i n
n2
n

2(y,- y)'
I(x, - x) (y, - V)

-,i

,,,2 -.(LY,)2 n

Z(xiy,

-

xiV

b,!,-t(rfi)-I
zx.v. _
n

-

yiT + x

(ny) +

t)=bili - !b, -F.Ly, + nry ni ! =bi!,- nx,

(xx,) (xy,)

p(x,

t) :

2lfr, -I)(yi-y)

y_ ., Gr,) (ry,) ^,r,_ _._;_

J*:-'?1 1,,:
p(X, Y) or

ty

/

IJX! - ---' =

Lr.Ev

'x' and y' stand for the values of items in X and

I

series.

|.

Calculate the coefficient of corretation between x and y for the following data.

Since the given values are small we can use the

formula
3

l0
I 5
I

4 I

tle

J

2

9 4
8

7
;

6

8l I r00 I

49 I 16 64 l,q
qs

16 I 25t4 36 I

e l2s
t

100

20

-7-

t5
4

t0
Sr

ffi
321

).rxy

-Lr.n

2Y

54

28 64
63

-

302.s
18.5

:o

60

J(:r.s - 302.5) (3s5 - 302.5)
: :==$11 w'"4' /tz.s sz.s 82.5
1

18.5
x

were made : It:30, E/: 5, 2x2 =,670,2y2 = 2t5,Zxy = :Rq. On subsequent verification it was that the pair (x = lr,y = 4) was copied ,,n.ongy, the correct values being (x = r0,y: 14| the correct value of correlation coeflicient.

Ex' 5' In order to find the correlation coefficient between two variables x and y pairs of observations, the following calculations

fror

Sof. Con'ected Xr= given, )x-incorrectvalue+ correctvalue:30_ l1 Similarly, corrected Xy :^5 4 + 14 : 15, corrected Zxz : Arc_ (ll)2 + corrected z1?:ZSS -@)2 +(t4)2:Ces, corrected
The correct value ofcorrelation coefficient is given by

+ l0:29
eO)2

:

Aqg,

Zxy:334- llx4+ l0 x 14:4j0

LXy-- Xx. Xv

,r-Yll,r

+_
17 .0 9

OrO

_29 xtS
393.75

/L*'+llou,-g]
#E=0.7747.

Jiivsz,4462s

and b respectively for thJ variates

. Third formura for p (X, r). (when the deviafi ons are taken from an as sumed u 4u 4Jrqtltc(I ll then neither Itll""::5:3i: ?11/,,r are large or involve fractions,;'i?iil'i 'lffiof rhe rwo formulas ffi il; :;;ffi :n",T;"H:H'fd :i_?il:ed ?i:::::?: ^: : :, * simplifi i by considering the deviatio", ;;;;r;;; ih, ;;ffi ;;; and y,

;

* iii;.' -'- ''

,

from assumed

rf ,, = tt ut -

-o ,, ^', -li -b -T-'', =--T-.
z*,y, Lx.xy'

i.e., x,-- hu, + a,

),i: fui+

b, then on substituting in the

p6,n=
s-2-4i

1Ex,)2

n

,i-ry
p(x,

and simplifying, we obtain the following

:

nn

2

:
of

Using short notations u andvin place .L

I

u-andfv, respectively, we may write this

in a simpler and convenient form u, und", , '=' 2uu

r=

_(2u)(2v)

n

:

A and B being the assumed meanc number of pairs of observation. This formula k is used when actual means are fractions.

Where

u:X-Ao, X-A v:Y*BorY-B h

ch17-1
above formula can be written as

1

ln some books you will find the notation d, and'd, for u and v respectively. Using this

p6.norr(X,Y)

=

under Edrd y -

Ld .Ld

,oz -(Ed,)2 ^n
tneans

*j-q/
x
and y
,

book we have used the notation d, and drrespectively for the deviation from

the arithmetic means right in the beginning to know whether the means are whole

a fractions so that you may apply the formula rccordingly. tlr given values ars small, then apply formula (ll) hvolving direct values only'
retation of correlation coefficient
The coetficient of correlation shall always be between -l and +l ' flren r is +1, there is perfect correlatioh between the variables' llren r is *1, there is perfect negative correlation between the variables.

Shen r is between 0.7 to 0.999, there is a high degree of correlation befween the variables. The correlation shallbe positive if the sign of r is plus (+) and negative if the sign of r is ninus (-). $tren r is between 0.5 to 0.699, there is a moderate degree of correlation befween the
rariables.

Sten r is less than 0.5 there is a low degree of correlation between the variables' tr\tren r is zero, there is no conelation between the variables' a Calculate Karl Pearson's correlation coefficient between the marks in English anC by l0 students.

_ ^
l0
25

184

lo

ls.4o,

t=ffi=rt.z
uv

l2
22

-8
7

64

-6
4
-a -J
a

36

48
28

49
25

t6
4
9 0 0
I

t3
25

l6 l5
18

-5
7

l0
-21
0
0 6 35

49

T'

4'
_1

t6
49
36

0 0

ll
t2
25

l8
t7
23

-6
7
J

-t
5

49
9

25 36
I

2l
20

24

6

l8
-z
a

l7

2

4

-l

122

We take

l8

as the assumed mean

for both the series
LUV ll

..

Ia Iy
I

llr", (!4' ^llzr'- n )1"' - , )I \t
G')'

rl

_
./{:so
V

p2 _

!/2 l0

Ex.

7. Given the following pairs of values of the variable X

-@ l''" - io t-"" lo JI l,ro -Q'-l :4 =-:2re-= 2t0.8: ,1348.4xt27.6 =t?,1.?_=o.s?4r
J

Jqqqssji

and y

:

Sol. Calculate

p6, n

yourself. Ans. _1.
(6, r2).

To draw fhe scafter diagram, plt the points (2, 16), (4, l4), (14,4) and (16,2). Fig. r7,05 showi the required scatte, diagram.

'-'

(8, r0), (10, 8).

(

16

14 12
10

I
6 4

246810121416
Fig. 17.05
The points in the scatter diagram proceed in a line from top to the bottom which indic and )'are in perfect negative correlation.

(a) calculate the value of the correlation coefficient for the following date : (g-, j3)t (7,73), (8,83), (e, e3), (r0, 10.3), (il. ll,rt1r,,,r1,,lrl,,(1 2):9,,.!t):-(r,.1.]: (t2,

Ex'

8'

tt), (lJ, ll.s), (l4,lz), (15, t2.s),iio, lr), (rz,'ii.s), fi's, iol,irn", (b)
Draw the scatter diagram.
Comment on the resutt.

i;;:rdl

(c)

ch17-13

Sol. Let the assumed mean for a=20

the first

variateXbe l0

and that for the second variate be I 3.

uv
I

l3
23 33 43
53 63 73 83 93 103

2
3

.l
_i

6
1

*4
-J
1

-9 -8 -7 -6 -5

0

8l
64

0 100

0

l0
20 30 40 50 60 70
80

-80

49 36
25
I

400 900 600

-

140

l6
9

2500
3600

I
9

.4
I

-180 -20a -200 -180

4900 6400
8100 6.25 4.00 2.25
1.00

t0

-l
0
I

-

140

lt
t2

90

-80
0

0
I

10,5

ll
I 1.5

-2.5
-L
a

2
J

-2.5

l3 t.t l5 t6

4 9

*1.5

-4
*4.5

t2
12.5

4
5

-l
-0.5
0 0.5
I

l6
25

-+
-2.5
0

l3
i 3.5

0.2s
0.00 0.2s

6
7
8

l7
r8

z6 4g
64

14

3.5
8

t9

t4.5 I5

:0

I
l0 tra = l0

t.00
2.25

1.5 2

8l
100

l3.s
2A

4.jtJ

Zv = 447.5

\uz :670

Zvu:

p6,n

=

__E,,_,lL),)(r")

/[r,''+i]
-r r72.5
|

{r,'ry,
z+:ss"tmna6

_
t3e6.2s

-

1172.5

,l[uro- qo]' /) J( ,rrrr.rr- (++z's)') V\ zo Il-""'""-- n )

:- __119Ji2r_-.* t3s6.2s _ =J0o.s Jts508i38

350&3s42

:

-0.3e7e8

It)
|

".06.

Ploning the points (1, l3), (2, 23),.....(20,15) we obtain the scatter diagram as shown in

I-.11.

Spearman's rank correlation coefficient

I

Sometimes such problems are faced that it is possible to arrange the various items of a series in the quantitative measurement of their values is difficult for example, it is possible ':: a class teacher to arrange his students in ascending or descending ;order ofintelligence, even .-:ugh intelligence cannot be measured quantitatively. No doubt, the quantitative study about the -::lligence of students can be made by holding an examination and assigningthem marks, but this -.'thod can never be said to be infallible. There are many such attributes which are incapable of :-rntitative measurements, tbr example, honesty, character, moralify, etc. In such ca.ses it is possible to rank the individual in some order. The most intelligent individual -.:y be given rank l. next rank 2 and so on.

t-:al order but

lre

The conelation coeificient between two series of ranks is called'Rank Correlation Coefficient'. formula for coefficient of rank correlation as gjven by Edward Spearman is

R=l-

or R=

t-u?o'
n -n
r
is the nurnber

'rhere D is the difference between the corresponding ranks of the two series and rdividuals in cach series.

of

Note l. lnstead of assigning ranks I , 2, 3, .. . from highest to lowest, we can also assign these :anks flom the lowest to highest, i.e., rank I to the least intelligent. rank 2 to the next more intelligent, rert rank J, and so on.
= 0.

2. Remember that the algebraic sum of the rank differences is always 0, i.e." L D is always If it is not so, then some mistake has been commifted at the time of assigning ranks. Note3. 1'heinterpretationsofthevaluesof Rarethesameasgivenon page6 inArt. 17.05.
Note

17,12. Solved examples
We may come across two types of problems
:

(a)

Ll/hen ranlcs are

given

(b)

I(hen ranks ore not given.

lYpe l. Wken rsnks are given Working rulc Step l. Compute D, the dffirence of the ranl<s. Step 2. Compute D2 and get the sumZD2. Step 3. Substitute the values in theformula.

related ? M athematics. To what extent the knowledge of the students in the two subjects is Statistics
Sol.
:

Ex. 10. Following are the ranks obtained by

l0

students in two subjects, Statistics

ld

Mathematics:

t234 2415

5678910 397106E

I | lfiatistics\x)l Mathematics(tt)l

non*".f

a:t"-v)l D'lI lantof ..1

I r

I:Iq I

I
I

z

I 'e I I l; s I I

I |;1 |i|I s I -t I -z
I s t
'o e s

-r |I I |I I

':t-;@)
=+0.76

oLt)

6x40 -'-rrrt-u

=-ffi:,-o:,r

I n I I I I ro I I I
Caution. When 1}pe

I -3 | + I o | e I -z lo +

|
|
I I

the ranks are already given as in the above example, rJo not commlt tne mls&lE

I ,: | s | .z I + I | l>o'z:aol I r

|

I

|

fassigning new ranks.

2. When runks are not given, gr Wt"t.n no ranks are given, but actual data are given, then we should assign ranks' We can next to the highest (lowest) as 2 and folh ranks by taking the highest as I or the lowest value as l, anks the same procedure for both the variables. he sz
Ex.

3l and coefficient of correlation of rarb Compute their ranks in the two subjects (/"S'C' 2002' CBJ{Interpret the result.)
Sol.

Marks in Physics: Marks in Mathematics:

ll.

The marks obtained by the students in Physics and in Mathematics are as

35 30

23 33

47 45

l7 23

l0 I

43 49

I 12

6 4

follor28

x
35 23

v
30
33

Ranksinx:R,

Ranks

iny ; R2

D:R,-R, -2
2

D'
4
4
1

47

45
23
8

-l
0

l7

0
I

l0
43

-l
I

49

I
I 0

I
6

l2
4

I
0 0

28

3l

0

zD2:

12

r: t-;@4:'-t(sl-D-'-720-'

l0-10 :u'v'

]

a very high relationshio This means that the students who are gor good ics are good in Mathematics also uir"_u.rri

hi gh value of

r indicates

ch17_i 7-21

l2 , Ten competitors in a beauty contest are 'anked by three jur lges in the followin wing
5 8
7 r

*d

8 6

I

correlation coefficient to discuss which pa ir ofjudges have tl re nearest approach tastes in beauty.
Second
4
8

6 9 l0

t0

l0
9

732 321 234

l
2

lr')
t

Third
6
7
8

Judge (R2) Judge (R.

Dtz:
Rr-Rz

Drz: Rr-R,

Dzz: Rz*R.'
_1
I

D,,,
9 9 9

D,l
25

D,,,
4
I

7

6
5

-3 -3 -3
')

-5 -2 -4
7
5

4
16

I
5

-l
5

,l
25 0
I

4

49

4

9

l0
9 2
3

l0
J

-3
0 4
I

0

-4
I

-l
I

l6
9 0

t6 t6
I

I
I I

2
I

5

4

I

t

.)

0

_I
_J

I

l6
I I

25

0 4 2D?t

9

,DI,

:74

l I
I

'rz

=

l- n(n. _t) =, -10"99:0'55
6x74
= 0.05 6DD:-

u'-!f'

:

EDlt

156

:44

| I

'J
tl

u14 6x156 ',::r- n(n._t) =,- m. 99

6x44 :0.73. m 99 " ! r:, is maximum, we conclude that the pair of second and third judges has the nearest )m beautv.

":

=l-ffi='-

h fn" coefficient ofrank correlation ol ' ma rks_ obtained by I 0 students in English F was found to be 0.S. It was later disr :overed that the difference
Stained

and in rant<s in- ttre two by one of the students was wrr rngly taken as 3 instead of 7. Find th. of rank correlation. (.5.C. 200e Typel

.;;;;;;

r: . - 62D2 Substitutingth | ;@ ) 0.5=l_ 6>D2 _6>D, l0(100_r) - 900
:0.5x990

s

given values, we get

_

=

I

-

0.5

:

0.5 =+ 6 2,8 :0.5 x 990
g2.5

+

ZD2

6

_

=

92.5 ... correcte( lvalue of

ZD2:

-

32

+ 72:

122.5.

ch17-22

LS.C. Mathematics

-

rhe correct value of

Ex. 14. Find out rank correlation from the following data:
S.N.

r: - if ilf'f*i =t - #r* : - .74:0.26 approx. ' '
ll'

Rank differences

:

-2

-4

-l

+3

+2

0

-

+3 I D:

+3

Sol. First we find the unknown rank difference by using the fact that :2. : value of the unknown rank difference + I I 9
Now we have

0. This gire:

S.N.

:

D: D2:

l2 ,24 416

34s67 -r 3 2 19404
6r160

8

0

-2

"3
9

9 l0 3*2 94

N:

l0

t D:0
L D2:60

. "
lf

=t_g=*=*=0.63e. r_ t_ 6Z^D' =t_ t ' 10(102-l) 990 990 ll ngz-l)
(l'e'' tie in either or both series two or more individuals (or items) have the same score

17.13, Correlation for tied ranks
then the Spearman's rank correlation coefficient formula,

' R: l - *fails n(n'-l)

to give the corre

fr coefficient and a correction or modification in the formula becomes necessary because this is given to : is based on the supposition that ranks of various items are different and that no rank

than one item.

The problem is solved by assigning a common rank to each of the individuals who are in This common rank is the average of the ranks of these individuals' For example, we have the series

e

90

55 78 72 90 and 85 and the next value 80 appears four Rank I and rank 2 are assigned to two values,

85

80 80
= a.5

80

80

69

69

then the r.unt

t!ry yf I

will

be assigned to each of the value 80 at the four piaces and th;

ass:i:nt lower value than 80, viz.,7Lwould be assigned the rank 7 because 6 ranks have already been rank 9 and 10, then fhe common rank assigru Now, we see that the value 69 is repeated twice at

each value would

be

=9.5 and the next value 55 will have the rank I I and the value 5:

rank 12' Thus' we
Series

(.Y)

:

have 85 90 a{
2

Rankassigned:
rank

Ro Ro : 80 80 80 80 78 72 4.5 4.5 4.5 4.5 7 8
l.

69 69 55

-n

9'5 9:5 I I -1-

:
j'
mmr
1ur

Note: We could have started by assiging ranks from lowest value,

e', rank I to 54, rank 2 to

5

3t-4 'f

--t.z.^,o.L^3,5 to each ol the values 69, rank 5 Io 72, rank 6 to 78, then rank

7+8+9rlo : : -bee'I

each of the values 80, rank under: : 90 Series QQ

ll

to 85 and lastly tank12 to 90' Then, the ranks wotrld have
80 8.5

ni

Rank assigned

:

12

85 80 80 8.5 8.5 II

80 78 8.56s

72

69
3.5

69 3.5

55 2

n

factor is added to the Spearman's rank correlation formula

factor : After assigning common rank to items with

ch17-23
rept :ated values an adjustment,t or

as eries there are rn items whose ranks are common, then for

c<

)rectton

.l

rre peating value in both the series. The modified formula, the n is given b

lf

(m3

added -m)isadd

R

:r-

o(o'-i-

:-,

3d
the in GreatBr itain in the years
resr nlt.

fr1, tn2t nt3;...... are the numbers of times a value is repeat .ed.

15. Tie figures below give the numbers of passenger_c gr( ,pair, and the number of persons killed in train accidents arrying veJ ricles under

E. I rind the correlation between these figures and commenl ; on the
,s'm ay be used.

The method

IH.s.c.l

br hls

x
Vehicles under

v
No.

of

repair (1A00's) )1
2.s 2.7
2,5 2.5

Rank

Rank

Persons 30
'r'7

ofx
7

D
a

ofy
9

D2

h,, E04o

4 0
I 64
9

l0
7

l0
6 2
7

0 I
8

50 76 37
14

f*' lEa
f*t
i

l0 l0
5 3 1.5 1.5

J

2.7
3.5

lt44

ll
8 3

*4
-J

t6
9

34
75

l9r5
lv+6

4.9 5.4 5.4
3.8

0
3.5
.5

0
12.25

60
121 74

llAT
1948

5
1

0.25 0

4.0

4.0

0

2o2 -rsenes 5'4occurstwice. Therankof 5.4k

:

t 15.5

+: 2
^

I.5. Thenextrower varue4.grrurrign.d
:T,andnext2,Soccursthrice.

kas3. Similarly,2.Toccursthrice, Therankof2.T
nN

oI

. 9 + 10 + It l.) rs ______: 'J

10.

=! #,r*t
-m)tothevalue

dng to these ranks the coefficients will have to be corrected by adding

l. In respect of the x series, this addition will

be

i,rr'-21* $o' -3)+ #(33 -21: t2!.+Z=a.5
occurs fwice, 2.7 occurs thrice and 2.5 occurs thrice.

(Art, 17.13)

lf
also.

there had been repeated values iny-series also, we would have added the correction

fq

.. rr-t
=

----I--6x120

6l(>Dt)+4.51_, ,-

-

6il15.5+4.51 I lr _ ll

_r l-

l33t _t I =l

To _, -ffi

=|

-

0.545 = 0.455

Since the value 0.455 or r lies between 0.40 and 0.70, therefore, it signifies substantial or

relationship. It means that ordinarily, the higher the number of passenger-carrying vehickr repair, the higher is the number of persons killed in train accidents in Britain. But it does not
follow that one is the cause of the other. There can be many other causes, for example, wear
etc.

Ex. 16. Find out the rank correlation coefficient between the heights of fathers rJ from the following data: Height of fathers in inches: 65 66 67 67 68 69 70 Height of sons in inches : 67 68 65 68 72 72 69 7t
Sol. x

v
67 68
65

Rank

ofx

Rank

ofy

D: R,- R,
I 1.5

D2

(Rr)
65
8 7

(Rz)

',
5.5
8

I

66 67 67
68

2.25 6.25 0 6.25

5,5 5.5

1.5
0

68 72
72

5.5
1.5

4
J

2.5

69 70
72

.

1.5

r.5

))\
4

69

2
I

4
3

-z
1

7l

4

L
In the x-series, 67 occurs twice and its rank In the y-series, 72 occurs twice and its rank
68 occurs twice and its rank

D2

=26

i, 516 : S.S 2 -''
is
I+

2

2

: I'''s

ir

f

=

S.S

The correction faclor for.r-series

The correction factor for y-series

:
:

-' =*t - ^ = t t'=o's

t2

.'.

*.rrt -zl + lff
0.5

Total correction factor

+l

:

-

2) = 0.5 + 0.5 =

I

1.5

K = I - -----, r\ n@z -l\ -

6lLOz+1.5]_, 6(26+1.5) _, - t----::r-----8(8'- l)

6x27.5
504

-

so{:165

504 = 1?? = 0.67261s=s.673 504

*-j

ch17-25
Ex. 17. Find the Spearrnan's rank coefficient of correlation from the fotlowing data
:

Yz13
Sol.

33 13
X

ztt)

I
6

24

l6 l5

16

65 25 209616

16
I

57

We will solve this question by assigning ranks from Iowest to highest, i.e., rank lest value, rank 2 to the next value and so on. Series A

to the

Series )/
I

Rank Differences

Squares D2

Rank - X

Rank

-

)'

D=Rr-R,
+ 2.5 + 0.5
_J

(Rr)
48
33
8

(Rz)

6
1 I J J

l3 l3
LI

5.5 5.5

6.25 0.25
9

40
9

1A

t0
2.s
7

6

l6 t6
65

l5
4
2A

- 1.5 -4
+2

2.25

t6
4
I I

I
9 4

l0
5

25

9 6

+l +l
+ 0.5

l6
57

J

?5

0.25 I

9

l9

I

+l

t0
is repeated 3 tirnes

n:
in seriesXanO

l0

Ip:o

Lo2:ql

,o

$t:t -3) will be added to L D2.ln seriesy, l3 is repeated #rrt -2) will be added twice to I
-*;+$<m',
D2.

and 6 is also repeated 2 times. So the factor

R: t*

LD'z+iQn:t

: t_

4r+if-(33

-:l+$tz3 -2)+ie3 -z)

, ' -'- 6[41+2+0.5+0.5]_, 6x44 _990_264 - 990 tooo-to
ffi:0.733.
EXERCTSE t7 (b)
Find Rank Correlation Coefficient by Spearman's formula in the fohowing questions.

Typc

l.

(Based on the formula

f

=I

- -p4, n(n" 60 78

-l\
:

l.

Thc rnarks obtaincd by nine students in Physics and Mathematics are given below
Physics

Mathematics

48 62

72 65

62 70

56 38

40 54

39 60

52 32

30

3t

Calculate Spearman's coefficient correlation and interpret the result.