You are on page 1of 8

Correlation Analysis:

According Simpson and Kafka, “Correlation analysis deals with the association between two or more
variables.

The problem of analysing the relationship between different data should be broken into following
steps:

1. Determining whether the relationship exists or not.

2. Testing whether it is significant

3. Establishing the cause and effect relation

It should be noted that the detection and analysis of correlation between two statistical variable
requires relationship of some kind which associates the observation in pairs, one of each pair being a
value of each of the two variables.

According to Simpson and Kafka “ Correlation Analysis deals with the association between two or
more variables”

The problem of analysing the relationship between variables should be

a. Determining whether the relationship exists, if so measure it.

b. Testing whether it is significant.

c. Establish the cause and relationship if any.

The correlation coefficient lies between -1 and +1.

Types of correlation

1. Positive or negative

2. Simple, partial and multiple

3. Linear and Non-linear

Methods of studying correlation

The various methods of studying correlation are:

1. Scatter Diagram method

2. Graphic method

3. Karl Pearson Coefficient of correlation

4. Concurrent deviation method.

The value of correlation coefficient lies between -1 to +1. When r = +1 infers that there is a perfect
positive correlation between the variables. When r = -1, it means that there is perfect negative
correlation between variables. When r=o, It means that there is no relationship between two
variables. In practical r = +1, -1, and 0 are rare.

Probable Error (P.E):

Probable error is possible to determine the reliability of the value of the coefficient and it depends
on the conditions of the random sampling.

The formula for estimating the correlation coefficient is as follows:

P.E = 0.6745 1-r1/√N, where r is the coefficient of correlation and N is the number of observations.

If the value of r is less than the P.E, there is no evidence of correlation. It further reveals that the
value of correlation is not significant.

If the value of r is more than six times the probable error, the coefficient of correlation is practically
certain. It further reveals that the correlation coefficient is significant.

Adding and subtracting the value of probable error from the coefficient of correlation, we get
respectively the upper and lower limit of correlation coefficient in the population. It can be
represented as ρ = r + or – P.E, where ρ is the correlation coefficient of population.

The Probable error can be used for the following conditions is satisfied:

1. The data must fit in normal distribution.

2. The sample selected should be in unbiased manner and the individual items must be independent.

3. The statistical measure for which P.E is measured must have been estimated for a sample.

Karl Pearson’s Coefficient of Correlation

There are several methods of studying coefficient of correlation. They are

1. Deviation taken from Actual Mean

2. Direct method

3. Deviation taken from Assumed Mean

4. Grouped data

1. Deviation taken from Actual Mean

1. The following table gives indices of industrial production of registered unemployed in hundred
thousand. Calculate the value of the coefficient of correlation.

Year : 2011 2012 2013 2014 2015 2016 2017 2018

Index of production : 101 103 105 108 104 110 103 98

No.Unemployed : 14 11 12 12 13 13 18 27
X = 832/8 = 104, Y = 120/8 = 15

X Y x= X-X Y= Y-Y x2 y2 xy

101 14 -3 -1 9 1 3

103 11 -1 -4 1 16 4

105 12 1 -3 1 9 -3

108 12 4 -3 16 9 -12

104 13 0 -2 0 4 0

110 13 6 -2 36 4 -12

103 18 -1 3 1 9 -3

98 27 -6 12 36 144 -72

………. ……….. ……….

100 196 -95

r=∑xy/√(∑x2∑y2)

= -95/√100X196

= -0.679

There is a high negative correlation between index of production and number unemployed.

2. Deviation taken from Assumed Mean

This method can be useful if the variable has minimum numerical value.

2. Calculate coefficient of correlation from the data given below

X: 9 8 7 6 5 4 3 2 1

Y: 14 17 13 14 10 13 9 7 11

X Y X2 Y2 XY

9 14 81 196 126

8 17 64 289 136

7 13 49 169 91

6 14 36 196 84

5 10 25 100 50
4 13 16 169 52

3 9 9 81 27

2 7 4 49 14

1 11 1 121 22

…. …. ….. ……. ……

49 108 285 1370 602

r= (N∑XY - ∑X∑Y)/√(N∑X2 –(∑X)2) √(N∑Y2 –(∑Y)2)

r= (9X602 -49X108)/ √9X285-(49)2 * √9X1370-(108)2

= -0.381

There is a negative correlation between X and Y

3. Assumed Mean Method

The following table gives the distribution of items of production and also the relatively defective
items among them, according to size groups. Find the correlation coefficient between size and
defect in quality and its possible error.

Size group : 16-17 17-18 18-19 19-20 20-21 21-22

No.of item :150 220 290 310 350 250

No.of defective items : 100 112 120 130 130 64

Solution:

Let us consider the mid-point of size be denoted by X and % of defective items by Y

Size Mid-value(X) Y dx dy dx2 dy2 dxdy

16-17 16.5 66.67 -2 25.67 4 -51 658.95

17-18 17.5 50.91 -1 9.91 1 98.21 -9.91

18-19 18.5 41.38 0 0.38 0 0.14 41.38

19-20 19.5 41.94 1 0.94 1 0.88 41.94

20-21 20.5 37.14 2 -3.86 4 14.89 -7.72

21-22 21.5 -46.2 3 -15.4 9 237.16 25.6

r=(N∑dxdy -∑dx∑dy)/√ (N∑dx2 –(∑dx)2) *√ (N∑dy2 –(∑dy)2)

= 6 X(-114.23) -3X17.64/√(6X19-9) √6X1010.23


= -685.38 -52.92/√105*√6061.38

=-738.3/797.78 =-0.925

There is a high negative correlation between size groups and % of defectives.

Grouped Data

Calculate Karl Pearson’s Coefficient of correlation from the following data

m 35,000 45,000 55,000 65,000 75,000

X 30000-40000 40000-50000 50000-60000 60000-70000 70000-80000

fdxdy fdxdy fdxdy fdxdy fdxdy

dx -2 -1 0 1 2

m Y dy

17.5 15-20 -1 - - - - - - 2 -2 8 -16

22.5 20-25 0 - - 3 0 10 0 3 0 4 0

27.5 25-30 1 6 -12 7 -7 11 0 6 6 - -

32.5 30-35 2 4 -16 9 -18 20 0 7 -14 - -

f 10 19 41 18 12

fdx -20 -19 0 18 24

fdx2 40 19 0 18 48

fdxdy -28 -25 0 -10 -16

fdy -10 0 30 80 100

fdy2 10 0 30 160

fdxdy -18 0 -13 -48

r= N∑fdxdy-∑fdx∑fdy/√N(fdx2 –(fdx)2)* /√N(fdy2 –(fdy)2

= 100X(-79) -3X100/√(100X125 -9) X√100X200 -10000

= -7900-300/111.763X100 = -8200/11176.3

=-0.733

Probable error (P.E) = 0.6745 (1-r2)/√N


=0.6745 X0.462/√100

= 0.031

Rank Correlation

Where ranks are given

Two boys were asked to rank 7 different brands of mobiles. The ranks given by them are as follows:

Brand of

Mobile : A B C D E F G

Bhavesh:1 2 3 4 7 5 6

Suresh :3 1 4 2 6 5 7

Calculate Spearman’s Rank Correlation coefficient.

X(R1) Y (R2) D2 = (R1-R2)2

1 3 4

2 1 1

3 4 1

4 2 4

7 6 1

5 5 0

6 7 1

………..

12

……..

R = 1- 6∑D2/N(N2-1)

= 1-6X12/7(72-1)

= 1-0.214

= 0.786

There is a high positive correlation between X and Y

Where Ranks are not given


Quotations of Index numbers of security prices of a certain joint stock company are given below.

Year Debenture Price Share Price

1 97 73

2 99 85

3 98 78

4 96 75

5 94 77

6 95 67

7 93 83

Solution:

Year Debenture Price R1 Share Price R2 D2 = (R1-R2)2

1 97 3 73 6 9

2 99 1 85 1 1

3 98 2 78 3 1

4 96 4 75 5 1

5 94 6 77 4 4

6 95 5 67 7 4

7 93 7 83 2 25

………

45

……….

R = 1 – 6X45/7(72-1) = 0.919

There is a high positive correlation between Debenture Price and Share Price

Equal Ranks

Compute Spearman’s rank correlation for the following observation

Candidate: 1 2 3 4 5 6 7 8

Judge X : 21 23 29 24 31 31 24 25
Judge Y : 29 25 25 26 27 28 33 31

Solution:

Candidate Judge X Judge Y R1 R2 D2 = (R1-R2)2

1 21 29 8 3 25

2 23 25 7 7.5 0.25

3 29 25 3 7.5 20.25

4 24 26 5.5 6 0.25

5 31 27 1.5 5 12.25

6 31 28 1.5 4 6.25

7 24 33 5.5 1 20.25

8 25 31 4 2 4

……………

88.5

R = 1- 6(∑D2 +1/12 (m3 –m) +1/12 (m3 –m) +….)/N(N2 -1)

= 1- 6(88.5 +1/12(23 -2) +1/12(23 -2) + 1/12(23 -2))/8(82 -1)

= -0.036

There is a negative correlation between the ranks given by judge X and Judge Y.

You might also like