Professional Documents
Culture Documents
Correlation coefficient is used to measure the degree of relationship between two or more
variables. There are several types of correlation coefficient, but the most popular is Karl
Pearson’s correlation coefficient. Pearson’s correlation is a correlation coefficient
commonly used in linear regression.
1. “Correlation analysis deals with the association between two or more variables.”
- Simpson & Kafka
2. “If two or more quantities vary in sympathy so that movements in one tend to be
accompanied by corresponding movements in the other(s) then they are said to
be correlated.” - L. R. Conner
3. “When the relationship is of a quantitative nature, the appropriate statistical tool
for discovering and measuring the relationship and expressing it in brief formula
is known as correlation.” - Croxton & Cowden
4. “Correlation analysis attempts to determine the ‘degree of relationship’ between
variables.” - Ya Lun Chou
5. “Correlation is an analysis of the covariation between two or more variables.”
- A. M. Tuttle
Types of correlation:
1. Perfect positive correlation – 𝑟 = +1 or 𝜌 = +1
or
𝑛 ∑ 𝑢𝑣 − ∑ 𝑢 ∑ 𝑣
𝑟=
√𝑛 ∑ 𝑢2 − (∑ 𝑢)2 √𝑛 ∑ 𝑣 2 − (∑ 𝑣)2
where, 𝑟 = coefficient of correlation,
𝑛 = number of pair of observations,
𝑥−𝑎 𝑦−𝑏
𝑢= , 𝑣=
ℎ 𝑘
Type – II
𝑁 ∑ 𝑓𝑥𝑦 − ∑ 𝑓𝑥 ∑ 𝑓𝑦
𝑟=
√𝑁 ∑ 𝑓𝑥 2 − (∑ 𝑓𝑥)2 √𝑁 ∑ 𝑓𝑦 2 − (∑ 𝑓𝑦)2
or
𝑁 ∑ 𝑓𝑢𝑣 − ∑ 𝑓𝑢 ∑ 𝑓𝑣
𝑟=
√𝑁 ∑ 𝑓𝑢2 − (∑ 𝑓𝑢)2 √𝑁 ∑ 𝑓𝑣 2 − (∑ 𝑓𝑣)2
where, 𝑟 = coefficient of correlation,
𝑁 = total frequency,
𝑥−𝑎 𝑦−𝑏
𝑢= , 𝑣=
ℎ 𝑘
Type – III
∑ 𝑥𝑦
𝑟=
𝑛𝜎𝑥 𝜎𝑦
where, 𝑟 = coefficient of correlation,
𝑛 = number of pair of observations,
∑ 𝑥2 ∑𝑥 2 ∑ 𝑥2
𝜎𝑥 = √ −( ) =√ − (𝑥̅ )2 = standard deviation of x, and
𝑛 𝑛 𝑛
∑ 𝑦2 ∑𝑦 2
𝜎𝑦 = √ −( ) = standard deviation of y.
𝑛 𝑛
or
∑ 𝑢𝑣
𝑟=
𝑛𝜎𝑢 𝜎𝑣
where, 𝑟 = coefficient of correlation,
𝑛 = number of pair of observations,
𝑥−𝑎 𝑦−𝑏
𝑢= , 𝑣=
ℎ 𝑘
∑ 𝑢2 ∑𝑢 2
𝜎𝑢 = ℎ × √ −( ) = standard deviation of x, and
𝑛 𝑛
∑ 𝑣2 ∑𝑣 2
𝜎𝑣 = 𝑘 × √ − ( 𝑛 ) = standard deviation of y.
𝑛
Example 1: Calculate the Karl Pearson’s coefficient of correlation from the following data
and interpret the output:
Roll No. of students: 1 2 3 4 5
Marks in Accountancy: 48 35 17 23 47
Marks in Statistics: 45 20 40 25 45
Solution: Let 𝑥 and 𝑦 be the marks in Accountancy and Statistics respectively.
Roll No. 𝑥 𝑦 𝑥2 𝑦2 𝑥𝑦
1 48 45 2304 2025 2160
2 35 20 1225 400 700
3 17 40 289 1600 680
4 23 25 529 625 575
5 47 45 2209 2025 2115
∑ 𝑥 = 170 ∑ 𝑦 = 175 2
∑ 𝑥 = 6556 2
∑ 𝑦 = 6675 ∑ 𝑥𝑦 = 6230
𝑛=5
𝑛 ∑ 𝑥𝑦 − ∑ 𝑥 ∑ 𝑦 5(6230) − (170)(175)
𝑟= =
√𝑛 ∑ 𝑥 2 − (∑ 𝑥)2 √𝑛 ∑ 𝑦 2 − (∑ 𝑦)2 √5(6556) − (170)2 √5(6675) − (175)2
31150 − 29750 1400 1400
= = = = 0.4286
√32780 − 28900√33375 − 30625 √3880√2750 (62.2896)(52.4404)
Positive correlation.
Solution:
Produc- Unemp- 𝑢 =𝑥−𝑎 𝑣 =𝑦−𝑏
Year 𝑢2 𝑣2 𝑢𝑣
tion (x) loyed (y) 𝑎 = 105 𝑏 = 15
2001 100 15 -5 0 25 0 0
2002 102 12 -3 -3 9 9 9
2003 104 13 -1 -2 1 4 2
2004 107 11 2 -4 4 16 -8
2005 105 12 0 -3 0 9 0
2006 112 12 7 -3 49 9 -21
2007 103 19 -2 4 4 16 -8
2008 99 26 -6 11 36 121 -66
∑ 𝑢2 ∑ 𝑣2 ∑ 𝑢𝑣
𝑛=8 ∑ 𝑢 = −8 ∑𝑣 = 0
= 128 = 184 = −92
𝑛 ∑ 𝑢𝑣 − ∑ 𝑢 ∑ 𝑣 8(−92) − (−8)(0)
𝑟= =
√𝑛 ∑ 𝑢2 − (∑ 𝑢)2 √𝑛 ∑ 𝑣 2 − (∑ 𝑣)2 √8(128) − (−8)2 √8(184) − (0)2
−736 + 0 −736 736
= = =−
√1024 − 64√1472 − 0 √960√1472 (30.9839)(38.3666)
736
=− = −0.619 ≈ −0.62
1188.7469
Negative correlation.
or
Produc- Unemp-
Year 𝑢 = 𝑥 − 𝑥̅ 𝑣 = 𝑦 − 𝑦̅ 𝑢2 𝑣2 𝑢𝑣
tion (x) loyed (y)
2001 100 15 -4 0 16 0 0
2002 102 12 -2 -3 4 9 6
2003 104 13 0 -2 0 4 0
2004 107 11 3 -4 9 16 -12
2005 105 12 1 -3 1 9 -3
2006 112 12 8 -3 64 9 -24
2007 103 19 -1 4 1 16 -4
2008 99 26 -5 11 25 121 -55
∑𝑥 ∑𝑦 ∑ 𝑢2 ∑ 𝑣2 ∑ 𝑢𝑣
𝑛=8 ∑𝑢 = 0 ∑𝑣 = 0
= 832 = 120 = 120 = 184 = −92
∑𝑥 832 ∑𝑦 120
𝑥̅ = = = 104 and 𝑦̅ = = = 15
𝑛 8 𝑛 8
𝑛 ∑ 𝑢𝑣 − ∑ 𝑢 ∑ 𝑣 8(−92) − (0)(0)
𝑟= =
√𝑛 ∑ 𝑢2 − (∑ 𝑢)2 √𝑛 ∑ 𝑣 2 − (∑ 𝑣)2 √8(120) − (0)2 √8(184) − (0)2
−736 + 0 −736 736
= = =−
√960 − 0√1472 − 0 √960√1472 (30.9837)(38.3666)
736
=− = −0.6191 ≈ −0.62
1188.7392
Negative correlation.
Example 3: Calculate coefficient of correlation from the following data:
x: 100 200 300 400 500 600 700
y: 30 50 60 80 100 110 130
Solution:
𝑥 𝑦 𝑥2 𝑦2 𝑥𝑦
100 30 10000 900 3000
200 50 40000 2500 10000
300 60 90000 3600 18000
400 80 160000 6400 32000
500 100 250000 10000 50000
600 110 360000 12100 66000
700 130 490000 16900 91000
∑ 𝑥 = 2800 ∑ 𝑦 = 560 ∑ 𝑥 2 = 1400000 ∑ 𝑦 2 = 52400 ∑ 𝑥𝑦 = 270000
Here, 𝑛 = 7
𝑛 ∑ 𝑥𝑦 − ∑ 𝑥 ∑ 𝑦 7(270000) − (2800)(560)
𝑟= =
√𝑛 ∑ 𝑥 2 − (∑ 𝑥)2 √𝑛 ∑ 𝑦 2 − (∑ 𝑦)2 √7(1400000) − (2800)2 √7(52400) − (560)2
1890000 − 1568000 322000 32200
= = =
√9800000 − 7840000√366800 − 313600 √1960000√53200 √19600√532
32200 32200
= = = 0.9972
(1400)(23.0651) 32291.1753
Positive correlation.
or
𝑥−𝑎 𝑦−𝑏
𝑢= 𝑣=
ℎ 𝑘
𝑥 𝑦 𝑎 = 400, 𝑢2 𝑣2 𝑢𝑣
𝑏 = 80,
ℎ = 100 𝑘 = 10
100 30 -3 -5 9 25 15
200 50 -2 -3 4 9 6
300 60 -1 -2 1 4 2
400 80 0 0 0 0 0
500 100 1 2 1 4 2
600 110 2 3 4 9 6
700 130 3 5 9 25 15
𝑛=7 ∑𝑢 = 0 ∑𝑣 = 0 ∑ 𝑢2 = 28 ∑ 𝑣 2 = 76 ∑ 𝑢𝑣 = 46
𝑛 ∑ 𝑢𝑣 − ∑ 𝑢 ∑ 𝑣 7(46) − (0)(0)
𝑟= =
√𝑛 ∑ 𝑢2 − (∑ 𝑢)2 √𝑛 ∑ 𝑣 2 − (∑ 𝑣)2 √7(28) − (0)2 √7(76) − (0)2
322 322 322
= = = = 0.9972
√196 − 0√532 − 0 (14)(23.0651) 322.9118
Positive correlation.
Example 4: Calculate the coefficient of correlation between X and Y from the following data
and calculate probable error. Assume 69 and 112 as the mean value for X and Y respectively.
X: 78 89 99 60 59 79 68 61
Y: 125 137 156 112 107 136 123 108
Solution:
u=x–a v=y–b
X Y u2 v2 uv
a = 79 b = 136
78 125 -1 -11 1 121 11
89 137 10 1 100 1 10
99 156 20 20 400 400 400
60 112 -19 -24 361 576 456
59 107 -20 -29 400 841 580
79 136 0 0 0 0 0
68 123 -11 -13 121 169 143
61 108 -18 -28 324 784 504
∑𝑢 = ∑𝑣 = ∑ 𝑢2 = ∑ 𝑣2 = ∑ 𝑢𝑣 =
𝑛=8
−39 −84 1707 2892 2104
𝑛 ∑ 𝑢𝑣 − ∑ 𝑢 ∑ 𝑣 8(2104) − (−39)(−84)
𝑟= =
√𝑛 ∑ 𝑢2 − (∑ 𝑢)2 √𝑛 ∑ 𝑣 2 − (∑ 𝑣)2 √8(1707) − (−39)2 √8(2892) − (−84)2
16832 − 3276 13556 13556
= = =
√13656 − 1521√23136 − 7056 √12135√16080 (110.159)(126.807)
13556
= = 0.97
13968.9323
Positive correlation.
1−𝑟 2 1−(0.97)2 1−0.9409
Probable error = 0.6745 ( ) = 0.6745 ( ) = 0.6745 ( )
√𝑛 √8 2.8284
0.0591 0.0399
= 0.6745 ( )= = 0.0141
2.8284 2.8284
Example 5: The following table gives the distribution of items of production and also the
relatively defective items among them, according to size groups. Find the correlation
coefficient between size and defect in quality and its probable error.
Size-group: 15-16 16-17 17-18 18-19 19-20 20-21
No. of items: 200 270 340 360 400 300
No. of defective items: 150 162 170 180 180 114
Solution: Let 𝑥 are the mid-point of group size and 𝑦 are the % of defective items.
Mid-
Size- 𝑢 =𝑥−𝑎 𝑣 =𝑦−𝑏
points 𝑦 𝑢2 𝑣2 𝑢𝑣
Group 𝑎 = 17.5 𝑏 = 50
(𝑥)
15-16 15.5 75 -2 25 4 625 -50
16-17 16.5 60 -1 10 1 100 -10
17-18 17.5 50 0 0 0 0 0
18-19 18.5 50 1 0 1 0 0
19-20 19.5 45 2 -5 4 25 -10
20-21 20.5 38 3 -12 9 144 -36
𝒏=𝟔 3 18 19 894 -106
𝑛 ∑ 𝑢𝑣 − ∑ 𝑢 ∑ 𝑣 6(−106) − (3)(18)
𝑟= =
√𝑛 ∑ 𝑢2 − (∑ 𝑢)2 √𝑛 ∑ 𝑣 2 − (∑ 𝑣)2 √6(19) − (3)2 √6(894) − (18)2
−636 − 54 −690 690
= = =−
√114 − 9√5364 − 324 √105√5040 (10.24695)(70.993)
690
=− = −0.9485
727.4613
Negative correlation.
1−𝑟 2 1−(−0.9485)2 1−0.8996
Probable error = 0.6745 ( ) = 0.6745 ( ) = 0.6745 ( )
√𝑛 √6 2.4495
0.1003 0.0676
= 0.6745 ( )= = 0.0276
2.4495 2.4495
Example 6: Given:
Total number of the product of deviations of X and Y series = 3044
Number of pairs of observations = 10
Total of the deviations of X series = - 170
Total of the deviations of Y series = - 20
Total of the squares of deviations of X series = 8288
Total of the squares of deviations of Y series = 2264
Find the coefficient of correlation.
Solutions: We are given ∑ 𝑢𝑣 = 3044, ∑ 𝑢 = −170, ∑ 𝑣 = −20, ∑ 𝑢2 = 8288,
∑ 𝑣 2 = 2264, 𝑛 = 10
𝑛 ∑ 𝑢𝑣 − ∑ 𝑢 ∑ 𝑣 10(3044) − (−170)(−20)
𝑟= =
√𝑛 ∑ 𝑢2 − (∑ 𝑢)2 √𝑛 ∑ 𝑣 2 − (∑ 𝑣)2 √10(8288) − (−170)2 √10(2264) − (−20)2
30440 − 3400 27040 27040
= = =
√82880 − 28900√22640 − 400 √53980√22240 (232.33596)(149.130815)
27040
= ≈ 0.78
34648.45107
Positive correlation.
Example 7: The following table gives the frequency, according to groups of marks obtained
by 67 students in an intelligence test. Measure the degree of relationship between age and
intelligence test:
Age in Years
Test Marks 18 19 20 21 Total
200-250 4 4 2 1 11
250-300 3 5 4 2 14
300-350 2 6 8 5 21
350-400 1 4 6 10 21
Total 10 19 20 18
Solution:
Age in Years
y 18 19 20 21 Total
Test fu fu2 fuv
v (f)
Marks X -1 0 1 2
u
200-
225 -1 4 4 4 0 2 -2 1 -2 11 -11 11 0
250
250-
275 0 3 0 5 0 4 0 2 0 14 0 0 0
300
300-
325 1 2 -2 6 0 8 8 5 10 21 21 21 16
350
350-
375 2 1 -2 4 0 6 12 10 40 21 42 84 50
400
N=67 ∑ 𝒇𝒖 = ∑ 𝒇𝒖𝟐 =
Total (f) 10 19 20 18 66
52 116
fv -10 0 20 36 46
fv2 10 0 20 72 102
fuv 0 0 18 48 66
Example 8: The following are the marks obtained by the students of a class in Statistics and
Accountancy:
Marks in Marks in Statistics
Total
Accountancy 0-4 4-8 8-12 12-16 16-20
0-4 2 1 1 - - 4
4-8 1 1 2 1 - 5
8-12 1 1 1 2 1 6
12-16 - - 2 1 2 5
16-20 - 1 - 1 2 4
Total 4 4 6 5 5 24
Prepare a correlation table taking the magnitude of each class interval as four marks and
the first interval as equal to 0 and less than 4. Calculate Karl Pearson’s coefficient of
correlation between the marks in Statistics and marks in Accountancy.
Solution: Let x be the marks in Statistics and y be the marks in Accountancy.
Marks in Statistics 0-4 4-8 8-12 12-16 16-20
Total
Marks in x 2 6 10 14 18 fv fv2 fuv
(f)
Accountancy Y v\u -2 -1 0 1 2
0-4 2 -2 2 8 1 2 1 0 - - 4 -8 16 10
4-8 6 -1 1 2 1 1 2 0 1 -1 - 5 -5 5 2
8-12 10 0 1 0 1 0 1 0 2 0 1 0 6 0 0 0
12-16 14 1 - - 2 0 1 1 2 4 5 5 5 5
16-20 18 2 - 1 -2 - 1 2 2 8 4 8 16 8
Total (f) 4 4 6 5 5 24 0 42 25
fu -8 -4 0 5 10 3
fu 2 16 4 0 5 20 45
fuv 10 1 0 2 12 25
Example 9: A computer operator while calculating the correlation coefficient between two
variables X and Y from 25 pairs of observation obtained the following results:
It was, however, discovered at the time of checking that he had copied down two pairs as (6,
14) and (8, 6) while the correct value was (8, 12) and (6, 8). Obtain the correct value of
correlation coefficient.
Solution:
X y x y
6 14 8 12
8 6 6 8
Example 2: Ten competitors in a beauty contest are ranked by three judges in the following
order:
1st Judge: 1 6 5 10 3 2 4 9 7 8
2nd Judge: 3 5 8 4 7 10 2 1 6 9
3rd Judge: 6 4 9 8 1 2 3 10 5 7
Use the rank correlation coefficient to determine which pair of judges has the nearest
approach.
Solution:
R1 R2 R3 D12 = 2
𝐷12 D23 = 2
𝐷23 D13 = 2
𝐷13
R1 -R2 R2 -R3 R1 -R3
1 3 6 -2 4 -3 9 -5 25
6 5 4 1 1 1 1 2 4
5 8 9 -3 9 -1 1 -4 16
10 4 8 6 36 -4 16 2 4
3 7 1 -4 16 6 36 2 4
2 10 2 -8 64 8 64 0 0
4 2 3 2 4 -1 1 1 1
9 1 10 8 64 -9 81 -1 1
7 6 5 1 1 1 1 2 4
8 9 7 -1 1 2 4 1 1
2
∑ 𝐷12 2
∑ 𝐷23 2
∑ 𝐷13
= = =
200 214 60
2
6 ∑ 𝐷12 6(200) 1200
𝑅12 =1− 2
= 1− 2
= 1− = 1 − 1.2121 = −0.2121
𝑛(𝑛 − 1) 10(10 − 1) 990
2
6 ∑ 𝐷23 6(214) 1284
𝑅23 = 1 − 2
=1− 2
=1− = 1 − 1.297 = −0.297
𝑛(𝑛 − 1) 10(10 − 1) 990
2
6 ∑ 𝐷13 6(60) 360
𝑅13 =1− 2
= 1− 2
= 1− = 1 − 0.3636 = 0.6364
𝑛(𝑛 − 1) 10(10 − 1) 990
Since coefficient of correlation is maximum in the judgements of the first and third judges.
We conclude that first and third judges have the nearest approach.
Solution: First assign ranks and then calculate rank correlation coefficient.
S. No. Marks by judge X Marks by judge Y R1 R2 D = R1 – R2 D2
1 52 65 3 3 0 0
2 53 68 2 2 0 0
3 42 43 5 6 -1 1
4 60 38 1 7 -6 36
5 45 77 4 1 3 9
6 41 48 6 5 1 1
7 37 35 8 8 0 0
8 38 30 7 9 -2 4
9 25 25 10 10 0 0
10 27 50 9 4 5 25
∑ 𝐷2 = 76
Example 4: Obtain the rank correlation coefficient between the variables X and Y from the
following pairs of observed values.
X: 50 55 65 50 55 60 50 65 70 75
Y: 110 110 115 125 140 115 130 120 115 160
Solution: For finding ranks correlation coefficient first rank two various values taking the
lowest as rank 1 and next higher as rank 2, etc.
X Y R1 R2 D = R1 – R2 D2
50 110 2 1.5 0.5 0.25
55 110 4.5 1.5 3.0 9.00
65 115 7.5 4 3.5 12.25
50 125 2 7 -5.0 25.00
55 140 4.5 9 -4.5 20.25
60 115 6 4 2.0 4.00
50 130 2 8 -6.0 36.00
65 120 7.5 6 1.5 2.25
70 115 9 4 5.0 25.00
75 160 10 10 0.0 0.00
∑ 𝐷2 = 134
1
6 [∑ 𝐷2 + 12 ∑5𝑖=1(𝑚𝑖3 − 𝑚𝑖 )]
𝑅 =1−
𝑛(𝑛2 − 1)
1 1 1 1 1
6[∑ 𝐷 2 + (𝑚13 −𝑚1 )+ (𝑚23 −𝑚2 )+ (𝑚33 −𝑚3 )+ (𝑚43 −𝑚4 )+ (𝑚53 −𝑚5 )]
12 12 12 12 12
=1− 𝑛(𝑛2 −1)
1 1 1 1 1
6 [134 + 12 (33 − 3) + 12 (23 − 2) + 12 (23 − 2) + 12 (23 − 2) + 12 (33 − 3)]
= 1−
10(102 − 1)
1 1 1 1 1
6 [134 + 12 (27 − 3) + 12 (8 − 2) + 12 (8 − 2) + 12 (8 − 2) + 12 (27 − 3)]
=1−
10(102 − 1)
1 1 1 1 1
6 [134 + 12 (24) + 12 (6) + 12 (6) + 12 (6) + 12 (24)]
=1−
10(102 − 1)
1 1 1
6 [134 + 2 + 2 + 2 + 2 + 2] 6(139.5) 837
= 1− =1− = 1− = 1 − 0.8454 = 0.1546
990 990 990
990−6 ∑ 𝐷 2
⇒ 0.2 = ⇒ 0.2 × 990 = 990 − 6 ∑ 𝐷2
990
or
𝑛 ∑ 𝑢𝑣−∑ 𝑢 ∑ 𝑣 ℎ 𝜎
(𝑥 − 𝑥̅ ) = 𝑏𝑢𝑣 (𝑦 − 𝑦̅) where, 𝑏𝑢𝑣 = × 𝑘 = 𝑟 𝜎𝑢
𝑛 ∑ 𝑣 2 −(∑ 𝑣)2 𝑣
or
𝑁 ∑ 𝑓𝑥𝑦−∑ 𝑓𝑥 ∑ 𝑓𝑦 𝜎
(𝑥 − 𝑥̅ ) = 𝑏𝑥𝑦 (𝑦 − 𝑦̅) where, 𝑏𝑥𝑦 = = 𝑟 𝜎𝑥
𝑁 ∑ 𝑓𝑦 2 −(∑ 𝑓𝑦)2 𝑦
or
𝑁 ∑ 𝑓𝑢𝑣−∑ 𝑓𝑢 ∑ 𝑓𝑣 ℎ 𝜎
(𝑥 − 𝑥̅ ) = 𝑏𝑢𝑣 (𝑦 − 𝑦̅) where, 𝑏𝑢𝑣 = × 𝑘 = 𝑟 𝜎𝑢
𝑁 ∑ 𝑓𝑣 2 −(∑ 𝑓𝑣)2 𝑣
b) Line of regression of y on x:
𝑛 ∑ 𝑥𝑦−∑ 𝑥 ∑ 𝑦 𝜎𝑦
(𝑦 − 𝑦̅) = 𝑏𝑦𝑥 (𝑥 − 𝑥̅ ) where, 𝑏𝑦𝑥 = = 𝑟𝜎
𝑛 ∑ 𝑥 2 −(∑ 𝑥)2 𝑥
or
𝑛 ∑ 𝑢𝑣−∑ 𝑢 ∑ 𝑣 𝑘 𝜎
(𝑦 − 𝑦̅) = 𝑏𝑣𝑢 (𝑥 − 𝑥̅ ) where, 𝑏𝑣𝑢 = × ℎ = 𝑟 𝜎𝑣
𝑛 ∑ 𝑢2 −(∑ 𝑢)2 𝑢
or
𝑁 ∑ 𝑓𝑥𝑦−∑ 𝑓𝑥 ∑ 𝑓𝑦 𝜎𝑦
(𝑦 − 𝑦̅) = 𝑏𝑦𝑥 (𝑥 − 𝑥̅ ) where, 𝑏𝑦𝑥 = = 𝑟𝜎
𝑁 ∑ 𝑓𝑥 2 −(∑ 𝑓𝑥)2 𝑥
or
𝑁 ∑ 𝑓𝑢𝑣−∑ 𝑓𝑢 ∑ 𝑓𝑣 𝑘 𝜎𝑦
(𝑦 − 𝑦̅) = 𝑏𝑣𝑢 (𝑥 − 𝑥̅ ) where, 𝑏𝑣𝑢 = ×ℎ = 𝑟𝜎
𝑁 ∑ 𝑓𝑢2 −(∑ 𝑓𝑢)2 𝑥
𝜎𝑥 𝜎𝑦
√𝑏𝑥𝑦 × 𝑏𝑦𝑥 = √𝑟 × 𝑟 = √𝑟 2 = 𝑟
𝜎𝑦 𝜎𝑥
or
𝑛 ∑ 𝑥𝑦 − ∑ 𝑥 ∑ 𝑦 𝑛 ∑ 𝑥𝑦 − ∑ 𝑥 ∑ 𝑦 (𝑛 ∑ 𝑥𝑦 − ∑ 𝑥 ∑ 𝑦)2
√ × = √
𝑛 ∑ 𝑥 2 − (∑ 𝑥)2 𝑛 ∑ 𝑦 2 − (∑ 𝑦)2 [𝑛 ∑ 𝑥 2 − (∑ 𝑥)2 ][𝑛 ∑ 𝑦 2 − (∑ 𝑦)2 ]
𝑛 ∑ 𝑥𝑦 − ∑ 𝑥 ∑ 𝑦
= =𝑟
√𝑛 ∑ 𝑥 2 − (∑ 𝑥)2 √𝑛 ∑ 𝑦 2 − (∑ 𝑦)2
Example 1: Calculate the regression lines from the following data. Also calculate coefficient
of correlation:
Roll No. of students: 1 2 3 4 5
Marks in Accountancy: 48 35 17 23 47
Marks in Statistics: 45 20 40 25 45
Solution: Let 𝑥 and 𝑦 be the marks in Accountancy and Statistics respectively.
Roll No. 𝑥 𝑦 𝑥2 𝑦2 𝑥𝑦
1 48 45 2304 2025 2160
2 35 20 1225 400 700
3 17 40 289 1600 680
4 23 25 529 625 575
5 47 45 2209 2025 2115
∑ 𝑥 = 170 ∑ 𝑦 = 175 2
∑ 𝑥 = 6556 2
∑ 𝑦 = 6675 ∑ 𝑥𝑦 = 6230
𝑛=5
∑𝑥 170 ∑𝑦 175
𝑥̅ = = = 34 and 𝑦̅ = = = 35
𝑛 5 𝑛 5
Line of regression of x on y:
𝑛 ∑ 𝑥𝑦 − ∑ 𝑥 ∑ 𝑦 5(6230) − (170)(175) 31150 − 29750 1400
𝑏𝑥𝑦 = = = = = 0.51
𝑛 ∑ 𝑦 2 − (∑ 𝑦)2 5(6675) − (175)2 33375 − 30625 2750
(𝑥 − 𝑥̅ ) = 𝑏𝑥𝑦 (𝑦 − 𝑦̅) ⇒ (𝑥 − 34) = 0.51(𝑦 − 35)
⇒ (𝑥 − 34) = 0.51𝑦 − 17.85 ⇒ 𝑥 = 0.51𝑦 − 17.85 + 34
𝑥 = 0.51𝑦 + 16.15
Line of regression of y on x:
𝑛 ∑ 𝑥𝑦 − ∑ 𝑥 ∑ 𝑦 5(6230) − (170)(175) 31150 − 29750 1400
𝑏𝑦𝑥 = = = = = 0.36
𝑛 ∑ 𝑥 2 − (∑ 𝑥)2 5(6556) − (170)2 32780 − 28900 3880
(𝑦 − 𝑦̅) = 𝑏𝑦𝑥 (𝑥 − 𝑥̅ ) ⇒ (𝑦 − 35) = 0.36(𝑥 − 34)
⇒ (𝑦 − 35) = 0.36𝑥 − 12.24 ⇒ 𝑦 = 0.36𝑥 − 12.24 + 35
𝑦 = 0.36𝑥 + 22.76
Correlation coefficient = 𝑟 = √𝑏𝑥𝑦 × 𝑏𝑦𝑥 = √0.51(0.36) = √0.1836 = 0.4285
Example 2: Calculate the regression lines from the following data. Also calculate the
coefficient of correlation.
x: 100 102 104 107 105 112 103 99
y: 15 12 13 11 12 12 19 26
Solution:
𝑢 =𝑥−𝑎 𝑣 =𝑦−𝑏
𝑥 𝑦 𝑢2 𝑣2 𝑢𝑣
𝑎 = 105 𝑏 = 15
100 15 -5 0 25 0 0
102 12 -3 -3 9 9 9
104 13 -1 -2 1 4 2
107 11 2 -4 4 16 -8
105 12 0 -3 0 9 0
112 12 7 -3 49 9 -21
103 19 -2 4 4 16 -8
99 26 -6 11 36 121 -66
𝑛=8 ∑ 𝑢 = −8 ∑𝑣 = 0 ∑ 𝑢2 = 128 ∑ 𝑣 2 = 184 ∑ 𝑢𝑣 = −92
∑𝑢 (−8) 8 ∑𝑣 0
𝑥̅ = 𝑎 + = 105 + = 105 − 8 = 104 and 𝑦̅ = 𝑏 + = 15 + 8 = 15
𝑛 8 𝑛
Line of regression of x on y:
𝑛 ∑ 𝑢𝑣 − ∑ 𝑢 ∑ 𝑣 8(−92) − (−8)(0) −736 − 0
𝑏𝑢𝑣 = = = = −0.5
𝑛 ∑ 𝑣 2 − (∑ 𝑣)2 8(184) − (0)2 1472 − 0
(𝑥 − 𝑥̅ ) = 𝑏𝑢𝑣 (𝑦 − 𝑦̅) ⇒ (𝑥 − 104) = −0.5(𝑦 − 15)
⇒ (𝑥 − 104) = −0.5𝑦 + 7.5 ⇒ 𝑥 = 0.5𝑦 + 7.5 + 104
𝑥 = −0.5𝑦 + 111.5
Line of regression of y on x:
𝑛 ∑ 𝑢𝑣 − ∑ 𝑢 ∑ 𝑣 8(−92) − (−8)(0) −736 − 0
𝑏𝑣𝑢 = = = = −0.7667
𝑛 ∑ 𝑢2 − (∑ 𝑢)2 8(128) − (−8)2 1024 − 64
(𝑦 − 𝑦̅) = 𝑏𝑣𝑢 (𝑥 − 𝑥̅ ) ⇒ (𝑦 − 15) = −0.7667(𝑥 − 104)
⇒ (𝑦 − 15) = −0.7667𝑥 + 79.7368 ⇒ 𝑦 = −0.7667𝑥 + 79.7368 + 15
𝑦 = −0.7667𝑥 + 94.7368
Correlation coefficient = 𝑟 = √𝑏𝑥𝑦 × 𝑏𝑦𝑥 = −√0.5(0.7667) = −√0.38335 = −0.6192
Example 3: Calculate the lines of regression and coefficient of correlation between X and Y
from the following data.
X: 78 89 99 60 59 79 68 61
Y: 125 137 156 112 107 136 123 108
Solution:
u=x–a v=y–b
X Y u2 v2 uv
a = 79 b = 136
78 125 -1 -11 1 121 11
89 137 10 1 100 1 10
99 156 20 20 400 400 400
60 112 -19 -24 361 576 456
59 107 -20 -29 400 841 580
79 136 0 0 0 0 0
68 123 -11 -13 121 169 143
61 108 -18 -28 324 784 504
∑𝑢 = ∑𝑣 = ∑ 𝑢2 = ∑ 𝑣2 = ∑ 𝑢𝑣 =
𝑛=8
−39 −84 1707 2892 2104
∑𝑢 (−39) ∑𝑣 (−84)
𝑥̅ = 𝑎 + = 79 + = 74.125 and 𝑦̅ = 𝑏 + = 136 + = 125.5
𝑛 8 𝑛 8
Line of regression of x on y:
𝑛 ∑ 𝑢𝑣 − ∑ 𝑢 ∑ 𝑣 8(2104) − (−39)(−84) 16832 − 3276 13556
𝑏𝑢𝑣 = = = = = 0.843
𝑛 ∑ 𝑣 2 − (∑ 𝑣)2 8(2892) − (−84)2 23136 − 7056 16080
(𝑥 − 𝑥̅ ) = 𝑏𝑢𝑣 (𝑦 − 𝑦̅) ⇒ (𝑥 − 74.125) = 0.843(𝑦 − 125.5)
(𝑥 − 74.125) = 0.843𝑦 − 105.7965 ⇒ 𝑥 = 0.843𝑦 − 105.7965 + 74.125
𝑥 = 0.843𝑦 − 31.6715
Line of regression of y on x:
𝑛 ∑ 𝑢𝑣 − ∑ 𝑢 ∑ 𝑣 8(2104) − (−39)(−84) 16832 − 3276 13556
𝑏𝑣𝑢 = = = = = 1.117
𝑛 ∑ 𝑢2 − (∑ 𝑢)2 8(1707) − (−39)2 13656 − 1521 12135
(𝑦 − 𝑦̅) = 𝑏𝑣𝑢 (𝑥 − 𝑥̅ ) ⇒ (𝑦 − 125.5) = 1.117(𝑥 − 74.125)
⇒ (𝑦 − 125.5) = 1.117𝑥 − 82.805 ⇒ 𝑦 = 1.117𝑥 − 82.805 + 125.5
𝑦 = 1.117𝑥 + 42.695
Correlation coefficient = 𝑟 = √𝑏𝑥𝑦 × 𝑏𝑦𝑥 = √0.843(1.117) = √0.941631 = 0.97
Example 4: Find the line of regression of x on y. Also calculate the correlation coefficient
between size and defect in quality.
Size-group: 15-16 16-17 17-18 18-19 19-20 20-21
No. of items: 200 270 340 360 400 300
No. of defective items: 150 162 170 180 180 114
Solution: Let 𝑥 are the mid-point of group size and 𝑦 are the % of defective items.
Mid-
Size- 𝑢 =𝑥−𝑎 𝑣 =𝑦−𝑏
points 𝑦 𝑢2 𝑣2 𝑢𝑣
Group 𝑎 = 17.5 𝑏 = 50
(𝑥)
15-16 15.5 75 -2 25 4 625 -50
16-17 16.5 60 -1 10 1 100 -10
17-18 17.5 50 0 0 0 0 0
18-19 18.5 50 1 0 1 0 0
19-20 19.5 45 2 -5 4 25 -10
20-21 20.5 38 3 -12 9 144 -36
𝑛=6 3 18 19 894 -106
∑𝑢 (3) ∑𝑣 (18)
𝑥̅ = 𝑎 + = 17.5 + = 18 and 𝑦̅ = 𝑏 + = 50 + = 53
𝑛 6 𝑛 6
Line of regression of x on y:
𝑛 ∑ 𝑢𝑣 − ∑ 𝑢 ∑ 𝑣 6(−106) − (3)(18) −636 − 54 −690
𝑏𝑢𝑣 = = = = = −0.1369
𝑛 ∑ 𝑣 2 − (∑ 𝑣)2 6(894) − (18)2 5364 − 324 5040
(𝑥 − 𝑥̅ ) = 𝑏𝑢𝑣 (𝑦 − 𝑦̅) ⇒ (𝑥 − 18) = −0.1369(𝑦 − 53)
⇒ (𝑥 − 18) = −0.1369𝑦 + 7.2557 ⇒ 𝑥 = −0.1369𝑦 + 7.2557 + 18
𝑥 = −0.1369𝑦 + 25.2557
Regression coefficient of y on x:
𝑛 ∑ 𝑢𝑣 − ∑ 𝑢 ∑ 𝑣 6(−106) − (3)(18) −636 − 54 −690
𝑏𝑣𝑢 = = = = = −6.5714
𝑛 ∑ 𝑢2 − (∑ 𝑢)2 6(19) − (3)2 114 − 9 105
Correlation coefficient = 𝑟 = √𝑏𝑥𝑦 × 𝑏𝑦𝑥 = −√0.1369(6.5714) = −√0.8996 = −0.9485
Example 5: The following are the marks obtained by the students of a class in Statistics and
Accountancy:
Marks in Marks in Statistics
Total
Accountancy 0-4 4-8 8-12 12-16 16-20
0-4 2 1 1 - - 4
4-8 1 1 2 1 - 5
8-12 1 1 1 2 1 6
12-16 - - 2 1 2 5
16-20 - 1 - 1 2 4
Total 4 4 6 5 5 24
Line of regression of x on y:
𝑁 ∑ 𝑓𝑢𝑣 − ∑ 𝑓𝑢 ∑ 𝑓𝑣 ℎ 24(25) − (3)(0) 4 600 − 0
𝑏𝑢𝑣 = × = × = = 0.5952
𝑁 ∑ 𝑓𝑣 2 − (∑ 𝑓𝑣)2 𝑘 24(42) − (0)2 4 1008 − 0
(𝑥 − 𝑥̅ ) = 𝑏𝑢𝑣 (𝑦 − 𝑦̅) ⇒ (𝑥 − 10.125) = 0.5952(𝑦 − 10)
⇒ (𝑥 − 10.125) = 0.5952𝑦 − 5.952 ⇒ 𝑥 = 0.5952𝑦 − 5.952 + 10.125
𝑥 = 0.5952𝑦 + 4.173
Line of regression of y on x:
𝑁 ∑ 𝑓𝑢𝑣 − ∑ 𝑓𝑢 ∑ 𝑓𝑣 𝑘 24(25) − (3)(0) 4 600 − 0 600
𝑏𝑣𝑢 = 2 2
× = 2
× = = = 0.56
𝑁 ∑ 𝑓𝑢 − (∑ 𝑓𝑢) ℎ 24(45) − (3) 4 1080 − 9 1071
(𝑦 − 𝑦̅) = 𝑏𝑣𝑢 (𝑥 − 𝑥̅ ) ⇒ (𝑦 − 10) = 0.56(𝑥 − 10.125)
⇒ (𝑦 − 10) = 0.56𝑥 − 5.67 ⇒ 𝑦 = 0.56𝑥 − 5.67 + 10
𝑦 = 0.56𝑥 − 4.33
Correlation coefficient = 𝑟 = √𝑏𝑥𝑦 × 𝑏𝑦𝑥 = √0.5952(0.56) = √0.333312 = 0.5773
Y on X:
𝜎𝑌 3.5
(𝑌 − 𝑌̅) = 𝑟 (𝑋 − 𝑋̅) ⇒ (𝑦 − 67) = 0.8 ( ) (𝑋 − 65)
𝜎𝑋 2.5
𝜎
(𝑋 − 𝑋̅) = 𝑟 𝑋 (𝑌 − 𝑌̅) (𝑋 − 𝑋̅)𝜎𝑌 = 𝑟𝜎𝑋 (𝑌 − 𝑌̅)
𝜎𝑌
𝜎
(𝑌 − 𝑌̅) = 𝑟 𝑌 (𝑋 − 𝑋̅) (𝑌 − 𝑌̅)𝜎𝑋 = 𝑟𝜎𝑌 (𝑋 − 𝑋̅)
𝜎𝑋