Asset-V1 VIT+MBA001+2020+type@asset+block@Week 3 Content

Correlation Analysis:
According Simpson and Kafka, “Correlation analysis deals with the association between two or more
variables.
The problem of analysing the relationship between different data should be broken into following
steps:
1. Determining whether the relationship exists or not.
2. Testing whether it is significant
3. Establishing the cause and effect relation
It should be noted that the detection and analysis of correlation between two statistical variable
requires relationship of some kind which associates the observation in pairs, one of each pair being a
value of each of the two variables.
According to Simpson and Kafka “ Correlation Analysis deals with the association between two or
more variables”
The problem of analysing the relationship between variables should be
a. Determining whether the relationship exists, if so measure it.
b. Testing whether it is significant.
c. Establish the cause and relationship if any.
The correlation coefficient lies between -1 and +1.
Types of correlation
1. Positive or negative
2. Simple, partial and multiple
3. Linear and Non-linear
Methods of studying correlation
The various methods of studying correlation are:
1. Scatter Diagram method
2. Graphic method
3. Karl Pearson Coefficient of correlation
4. Concurrent deviation method.
The value of correlation coefficient lies between -1 to +1. When r = +1 infers that there is a perfect
positive correlation between the variables. When r = -1, it means that there is perfect negative
correlation between variables. When r=o, It means that there is no relationship between two
variables. In practical r = +1, -1, and 0 are rare.
Probable Error (P.E):
Probable error is possible to determine the reliability of the value of the coefficient and it depends
on the conditions of the random sampling.
The formula for estimating the correlation coefficient is as follows:
P.E = 0.6745 1-r1/√N, where r is the coefficient of correlation and N is the number of observations.
If the value of r is less than the P.E, there is no evidence of correlation. It further reveals that the
value of correlation is not significant.
If the value of r is more than six times the probable error, the coefficient of correlation is practically
certain. It further reveals that the correlation coefficient is significant.
Adding and subtracting the value of probable error from the coefficient of correlation, we get
respectively the upper and lower limit of correlation coefficient in the population. It can be
represented as ρ = r + or – P.E, where ρ is the correlation coefficient of population.
The Probable error can be used for the following conditions is satisfied:
1. The data must fit in normal distribution.
2. The sample selected should be in unbiased manner and the individual items must be independent.
3. The statistical measure for which P.E is measured must have been estimated for a sample.
Karl Pearson’s Coefficient of Correlation
There are several methods of studying coefficient of correlation. They are
1. Deviation taken from Actual Mean
2. Direct method
3. Deviation taken from Assumed Mean
4. Grouped data
1. Deviation taken from Actual Mean
1. The following table gives indices of industrial production of registered unemployed in hundred
thousand. Calculate the value of the coefficient of correlation.
Year : 2011 2012 2013 2014 2015 2016 2017 2018
Index of production : 101 103 105 108 104 110 103 98
No.Unemployed : 14 11 12 12 13 13 18 27
X = 832/8 = 104, Y = 120/8 = 15
X Y x= X-X Y= Y-Y x2 y2 xy
101 14 -3 -1 9 1 3
103 11 -1 -4 1 16 4
105 12 1 -3 1 9 -3
108 12 4 -3 16 9 -12
104 13 0 -2 0 4 0
110 13 6 -2 36 4 -12
103 18 -1 3 1 9 -3
98 27 -6 12 36 144 -72
………. ……….. ……….
100 196 -95
r=∑xy/√(∑x2∑y2)
= -95/√100X196
= -0.679
There is a high negative correlation between index of production and number unemployed.
2. Deviation taken from Assumed Mean
This method can be useful if the variable has minimum numerical value.
2. Calculate coefficient of correlation from the data given below
X: 9 8 7 6 5 4 3 2 1
Y: 14 17 13 14 10 13 9 7 11
X Y X2 Y2 XY
9 14 81 196 126
8 17 64 289 136
7 13 49 169 91
6 14 36 196 84
5 10 25 100 50
4 13 16 169 52
3 9 9 81 27
2 7 4 49 14
1 11 1 121 22
…. …. ….. ……. ……
49 108 285 1370 602
r= (N∑XY - ∑X∑Y)/√(N∑X2 –(∑X)2) √(N∑Y2 –(∑Y)2)
r= (9X602 -49X108)/ √9X285-(49)2 * √9X1370-(108)2
= -0.381
There is a negative correlation between X and Y
3. Assumed Mean Method
The following table gives the distribution of items of production and also the relatively defective
items among them, according to size groups. Find the correlation coefficient between size and
defect in quality and its possible error.
Size group : 16-17 17-18 18-19 19-20 20-21 21-22
No.of item :150 220 290 310 350 250
No.of defective items : 100 112 120 130 130 64
Solution:
Let us consider the mid-point of size be denoted by X and % of defective items by Y
Size Mid-value(X) Y dx dy dx2 dy2 dxdy
16-17 16.5 66.67 -2 25.67 4 -51 658.95
17-18 17.5 50.91 -1 9.91 1 98.21 -9.91
18-19 18.5 41.38 0 0.38 0 0.14 41.38
19-20 19.5 41.94 1 0.94 1 0.88 41.94
20-21 20.5 37.14 2 -3.86 4 14.89 -7.72
21-22 21.5 -46.2 3 -15.4 9 237.16 25.6
r=(N∑dxdy -∑dx∑dy)/√ (N∑dx2 –(∑dx)2) *√ (N∑dy2 –(∑dy)2)
= 6 X(-114.23) -3X17.64/√(6X19-9) √6X1010.23

= -685.38 -52.92/√105*√6061.38
=-738.3/797.78 =-0.925
There is a high negative correlation between size groups and % of defectives.
Grouped Data
Calculate Karl Pearson’s Coefficient of correlation from the following data
m 35,000 45,000 55,000 65,000 75,000
X 30000-40000 40000-50000 50000-60000 60000-70000 70000-80000
fdxdy fdxdy fdxdy fdxdy fdxdy
dx -2 -1 0 1 2
m Y dy
17.5 15-20 -1 - - - - - - 2 -2 8 -16
22.5 20-25 0 - - 3 0 10 0 3 0 4 0
27.5 25-30 1 6 -12 7 -7 11 0 6 6 - -
32.5 30-35 2 4 -16 9 -18 20 0 7 -14 - -
f 10 19 41 18 12
fdx -20 -19 0 18 24
fdx2 40 19 0 18 48
fdxdy -28 -25 0 -10 -16
fdy -10 0 30 80 100
fdy2 10 0 30 160
fdxdy -18 0 -13 -48
r= N∑fdxdy-∑fdx∑fdy/√N(fdx2 –(fdx)2)* /√N(fdy2 –(fdy)2
= 100X(-79) -3X100/√(100X125 -9) X√100X200 -10000
= -7900-300/111.763X100 = -8200/11176.3
=-0.733
Probable error (P.E) = 0.6745 (1-r2)/√N

=0.6745 X0.462/√100
= 0.031
Rank Correlation
Where ranks are given
Two boys were asked to rank 7 different brands of mobiles. The ranks given by them are as follows:
Brand of
Mobile : A B C D E F G
Bhavesh:1 2 3 4 7 5 6
Suresh :3 1 4 2 6 5 7
Calculate Spearman’s Rank Correlation coefficient.
X(R1) Y (R2) D2 = (R1-R2)2
1 3 4
2 1 1
3 4 1
4 2 4
7 6 1
5 5 0
6 7 1
………..
12
……..
R = 1- 6∑D2/N(N2-1)
= 1-6X12/7(72-1)
= 1-0.214
= 0.786
There is a high positive correlation between X and Y
Where Ranks are not given

Quotations of Index numbers of security prices of a certain joint stock company are given below.
Year Debenture Price Share Price
1 97 73
2 99 85
3 98 78
4 96 75
5 94 77
6 95 67
7 93 83
Solution:
Year Debenture Price R1 Share Price R2 D2 = (R1-R2)2
1 97 3 73 6 9
2 99 1 85 1 1
3 98 2 78 3 1
4 96 4 75 5 1
5 94 6 77 4 4
6 95 5 67 7 4
7 93 7 83 2 25
………
45
……….
R = 1 – 6X45/7(72-1) = 0.919
There is a high positive correlation between Debenture Price and Share Price
Equal Ranks
Compute Spearman’s rank correlation for the following observation
Candidate: 1 2 3 4 5 6 7 8
Judge X : 21 23 29 24 31 31 24 25
Judge Y : 29 25 25 26 27 28 33 31
Solution:
Candidate Judge X Judge Y R1 R2 D2 = (R1-R2)2
1 21 29 8 3 25
2 23 25 7 7.5 0.25
3 29 25 3 7.5 20.25
4 24 26 5.5 6 0.25
5 31 27 1.5 5 12.25
6 31 28 1.5 4 6.25
7 24 33 5.5 1 20.25
8 25 31 4 2 4
……………
88.5
R = 1- 6(∑D2 +1/12 (m3 –m) +1/12 (m3 –m) +….)/N(N2 -1)
= 1- 6(88.5 +1/12(23 -2) +1/12(23 -2) + 1/12(23 -2))/8(82 -1)
= -0.036
There is a negative correlation between the ranks given by judge X and Judge Y.

Asset-V1 VIT+MBA001+2020+type@asset+block@Week 3 Content

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Asset-V1 VIT+MBA001+2020+type@asset+block@Week 3 Content

Uploaded by

Copyright:

Available Formats

Correlation Analysis:

1. Determining whether the relationship exists or not.

2. Testing whether it is significant

3. Establishing the cause and effect relation

The problem of analysing the relationship between variables should be

a. Determining whether the relationship exists, if so measure it.

b. Testing whether it is significant.

c. Establish the cause and relationship if any.

The correlation coefficient lies between -1 and +1.

2. Simple, partial and multiple

3. Linear and Non-linear

Methods of studying correlation

The various methods of studying correlation are:

1. Scatter Diagram method

3. Karl Pearson Coefficient of correlation

4. Concurrent deviation method.

Probable Error (P.E):

The formula for estimating the correlation coefficient is as follows:

1. The data must fit in normal distribution.

Karl Pearson’s Coefficient of Correlation

There are several methods of studying coefficient of correlation. They are

1. Deviation taken from Actual Mean

3. Deviation taken from Assumed Mean

1. Deviation taken from Actual Mean

Year : 2011 2012 2013 2014 2015 2016 2017 2018

Index of production : 101 103 105 108 104 110 103 98

………. ……….. ……….

100 196 -95

2. Deviation taken from Assumed Mean

2. Calculate coefficient of correlation from the data given below

49 108 285 1370 602

r= (N∑XY - ∑X∑Y)/√(N∑X2 –(∑X)2) √(N∑Y2 –(∑Y)2)

r= (9X602 -49X108)/ √9X285-(49)2 * √9X1370-(108)2

There is a negative correlation between X and Y

3. Assumed Mean Method

Size group : 16-17 17-18 18-19 19-20 20-21 21-22

No.of item :150 220 290 310 350 250

No.of defective items : 100 112 120 130 130 64

Let us consider the mid-point of size be denoted by X and % of defective items by Y

Size Mid-value(X) Y dx dy dx2 dy2 dxdy

16-17 16.5 66.67 -2 25.67 4 -51 658.95

17-18 17.5 50.91 -1 9.91 1 98.21 -9.91

18-19 18.5 41.38 0 0.38 0 0.14 41.38

19-20 19.5 41.94 1 0.94 1 0.88 41.94

20-21 20.5 37.14 2 -3.86 4 14.89 -7.72

21-22 21.5 -46.2 3 -15.4 9 237.16 25.6

r=(N∑dxdy -∑dx∑dy)/√ (N∑dx2 –(∑dx)2) *√ (N∑dy2 –(∑dy)2)

= 6 X(-114.23) -3X17.64/√(6X19-9) √6X1010.23

There is a high negative correlation between size groups and % of defectives.

Calculate Karl Pearson’s Coefficient of correlation from the following data

m 35,000 45,000 55,000 65,000 75,000

X 30000-40000 40000-50000 50000-60000 60000-70000 70000-80000

fdxdy fdxdy fdxdy fdxdy fdxdy

17.5 15-20 -1 - - - - - - 2 -2 8 -16

27.5 25-30 1 6 -12 7 -7 11 0 6 6 - -

32.5 30-35 2 4 -16 9 -18 20 0 7 -14 - -

fdx -20 -19 0 18 24

fdxdy -28 -25 0 -10 -16

fdy -10 0 30 80 100

fdxdy -18 0 -13 -48

r= N∑fdxdy-∑fdx∑fdy/√N(fdx2 –(fdx)2)* /√N(fdy2 –(fdy)2

= 100X(-79) -3X100/√(100X125 -9) X√100X200 -10000

Probable error (P.E) = 0.6745 (1-r2)/√N