You are on page 1of 13

Chapter 5

Regression and Correlation

Regression:
The dependence of one variable (dependent), on one or more than one variable is
called as regression.
It is used for estimating or predicting the average value of dependent variable from
the known values of independent variable.

Simple regression:
When we study the dependence of a variable on a single independent variable, it is
called as simple regression.
Example:
1. Consumption depends on income.
Here the consumption is dependent, whereas income is independent variable.

Multiple Regressions:
The dependent of one variable on more than one independent variable is called as
multiple regressions.
Example:
Yields of wheat depend upon fertilizers, seed, sand etc.

Deterministic and probabilistic relations

Deterministic Relation:
When there exists exact relationship between two variables, then it is called as
deterministic relation.
Example:
𝟗
F= 32 + c
𝟓

Probabilistic relation:
When the relationship is inexact, that is called as probabilistic relation.
Example:
Dependence of wheat production on different factors is an example of probabilistic
relation.
Scatter diagram:
For checking whether or not a relationship between two variables is linear or not,
scatter diagram is used. For constructing this diagram, pairs of (X 2, Y2) are
considered. On X-axis we take X (independent variable) and on Y-axis we take Y
(dependent variable).

Example: Draw a scatter diagram.


X Y
2 1

4 2

6 3

8 4
10 5

Scatter Diagram
6

3
Scatter Diagram

0
2 4 6 8 10
Equation of Regression:

Formula:

𝒏Ʃ𝑿𝒀−(Ʃ𝑿)(Ʃ𝒀)
b=
𝒏Ʃ𝑿𝟐 −(Ʃ𝑿)𝟐
And
̅ - b𝑿
a= 𝒀 ̅ =Ʃ𝒀, 𝑿
̅ [𝒀 ̅ =Ʃ𝑿]
𝒏 𝒏
Where,
𝑦̂= a + bX
Y= dependent variable
X= independent variable
a= Constant or intercept
b= regression co-efficient or slope of line

Example Table:

X Y XY X2

TOTAL
Q. For the data given below, find regression equation of Y on X.
X Y XY X2 Y2
1 1 1 1 1
2 4 8 4 16
3 9 27 9 81
4 16 64 16 256
6 18 108 36 324
TOTAL 16 48 208 66 678

𝒏Ʃ𝑿𝒀−(Ʃ𝑿)(Ʃ𝒀)
b=
𝒏Ʃ𝑿𝟐 −(Ʃ𝑿)𝟐

𝟓(𝟐𝟎𝟖)−(𝟏𝟔)(𝟒𝟖)
b=
𝟓(𝟔𝟔)−(𝟏𝟔)𝟐

𝟏𝟎𝟒𝟎 −𝟕𝟔𝟖
b=
𝟑𝟑𝟎 − 𝟐𝟓𝟔

b= 3.675
̅ - b𝑿
a= 𝒀 ̅

̅ = Ʃ𝒀 =
𝒀
𝟒𝟖
= 𝟗. 𝟔
𝒏 𝟓

̅ = Ʃ𝑿 = 𝟏𝟔 = 3.2
𝑿
𝒏 𝟓

a= 9.6 – 3.675 (3.2)


a= -2.16
Regression Equation:
𝑦̂= -2.16+3.675x
Q. Data has been defined as
X= rainfall, Y= Yield of Wheat, then find
i. Regression equation that predicts yields.
ii. Find the error/residuals and show that Ʃe2=0
iii. Show the data on scatter diagram.
X Y XY X2 𝑦̂=15.73 - 3.16x e=Y-𝑦̂
(x=12.9)
12.9 62.5 806.25 166.41 56.5 6
7.2 28.7 206.64 51.84 38.5 -9.8
11.3 52.2 589.86 127.69 51.4 0.8
18.6 80.6 1499.16 345.96 74.5 6.1
8.8 41.6 366.08 77.44 43.5 -1.9
10.3 71.3 734.39 106.09 48.3 23.0
15.9 54.4 864.96 252.81 66.0 -11.6
13.1 44.5 582.95 171.61 57.1 -12.6
TOTAL 98.1 435.8 5650.29 1299.85 435.8 0

̂= a + bX
𝒚 ̅ = Ʃ𝒀 = 𝟒𝟑𝟓.𝟖
Where
𝒀 = 𝟓𝟒. 𝟒𝟕𝟓
𝒏 𝟖
𝒏Ʃ𝑿𝒀−(Ʃ𝑿)(Ʃ𝒀)
b=
𝒏Ʃ𝑿𝟐 −(Ʃ𝑿)𝟐 ̅ = Ʃ𝑿 = 𝟗𝟖.𝟏 = 12.2625
𝟖(𝟓𝟔𝟓𝟎.𝟐𝟗)−(𝟗𝟖.𝟏)(𝟒𝟑𝟓.𝟖)
𝑿
𝒏 𝟖
b= ̅ - b𝑿
̅
𝟖(𝟏𝟐𝟗𝟗.𝟖𝟓)−(𝟗𝟖.𝟏)𝟐 a= 𝒀
b= 3.16 a= 54.475 – 3.16 (12.2625)
a= 15.73

̂= a + bX: 𝒚
Regression equation: 𝒚 ̂= 15.73 + 3.16X
Scatter Diagram

90

80

70

60

50

40

30

20

10

0
0 2 4 6 8 10 12 14 16 18 20

X (x-axis) Y (y-axis)
1 ( 12.9, 62.5 )
2. ( 7.2, 28.7 )
3. ( 11.3, 52.2 )
4. ( 18.6, 80.6 )
5. ( 8.8, 41.6 )
6. ( 10.3, 71.3 )
7. ( 15.9, 54.4 )
8. ( 13.1, 44.5 )

Where
Y= Wheat Production (y-axis)
X= Rainfall (x-axis)
Standard deviation of regression or Standard error of estimate

Ʃ𝒚𝟐 −𝒂Ʃ𝒚−𝒃Ʃ𝒙𝒚
Sy.x= √
𝒏−𝟐

Activity: for the previous question, find the standard error of estimate.
CO-EFFICIENT OF DETERMINATION

(Ʃ𝒚)𝟐
𝒂Ʃ𝒚+𝒃Ʃ𝒙𝒚−
𝒏
R2= (Ʃ𝒚)𝟐
𝟐
Ʃ𝒚 −
𝒏

Practice:
Find r2 for previous data.
Correlation:
The inter-dependence between two variables is called as correlation.
Example: the relationship between gold prices and oil prices.

Positive Correlation:
Two variables are said to be positively Correlated if tend to increase or
decrease in the same direction.
Example: The length of iron bar will increase if the temperature
increases.
Negative Correlation:
When one variable increases and other decreases then it is called as
negative correlation.
Example: Number of Corona patients will decrease by increased stay at
home or in isolation.

Postive Correlation
30
25
20
15 Postive
10 Correlation
5
0
2 4 6 8 10

Negative Correlation
30
25
20
15 Negative
10 Correlation

5
0
2 4 6 8 10
Correlation is measured by “r” the correlation co-efficient.

𝒏Ʃ𝒙𝒚 −(Ʃ𝒙)(Ʃ𝒚)
r=
√[ 𝒏Ʃ𝒙𝟐 −( Ʃ𝒙)𝟐 ] [ 𝒏Ʃ𝒚𝟐 −(Ʃ𝒚)𝟐 ]

Columns will be made:

x y xy X2 Y2

TOTAL

Q. Find the correlation co-efficient for the given data.

X Y X2 Y2 XY
1 2 1 4 2
2 5 4 25 10
3 3 9 9 9
4 8 16 64 32
5 7 25 49 35
TOTAL 15 25 55 151 88

𝒏Ʃ𝒙𝒚 −(Ʃ𝒙)(Ʃ𝒚)
r=
√[ 𝒏Ʃ𝒙𝟐 −( Ʃ𝒙)𝟐 ] [ 𝒏Ʃ𝒚𝟐 −(Ʃ𝒚)𝟐 ]

𝟓(𝟖𝟖) −(𝟏𝟓)(𝟐𝟓)
r=
√[ 𝟓(𝟓𝟓−( 𝟏𝟓)𝟐 ] [ 𝟓(𝟏𝟓𝟏)−(𝟐𝟓)𝟐 ]

𝟏𝟑
r= = 0.8
√𝟏 𝟎 ∗ 𝟐𝟔
Properties of correlation co-efficient
i. “r” is symmetrical with respect to “x” and “y”.
rxy= ryx .
ii. It lies between -1 and +1.
iii. It is independent of origin and scale, rxy= ruv.
iv. “r” is the geometric mean between two regressions co-efficient.
r = √𝑏𝑦𝑥 ∗ 𝑏𝑥𝑦
where

𝒏Ʃ𝒙𝒚 −(Ʃ𝒙)(Ʃ𝒚)
byx =
𝒏Ʃ𝒙𝟐 − (Ʃ𝒙)𝟐

𝒏Ʃ𝒙𝒚 −(Ʃ𝒙)(Ʃ𝒚)
bxy =
𝒏Ʃ𝒚𝟐 −(Ʃ𝒚)𝟐
Q. For the data.

X 1 2 3 4 5
Y 2 5 3 8 7
Solution:
we already know that in this question
n=5, ƩX=15, ƩY=25, ƩX2=55, ƩY2=151, ƩXY=88, r = 0.80

𝒏Ʃ𝒙𝒚 –(Ʃ𝒙)(Ʃ𝒚) 𝒏Ʃ𝒙𝒚 −(Ʃ𝒙)(Ʃ𝒚)


byx = bxy =
𝒏Ʃ𝒙𝟐 – (Ʃ𝒙)𝟐 𝒏Ʃ𝒚𝟐 −(Ʃ𝒚)𝟐

𝟓(𝟖𝟖) −(𝟏𝟓)(𝟐𝟓) 𝟓(𝟖𝟖) −(𝟏𝟓)(𝟐𝟓)


byx = bxy =
𝟓(𝟓𝟓) − (𝟏𝟓)𝟐 𝟓(𝟏𝟓𝟏) −(𝟐𝟓)𝟐

𝟒𝟒𝟎−𝟑𝟕𝟓 𝟔𝟓 𝟔𝟓 𝟔𝟓
byx = = = 1.3 bxy = = = 0.5
𝟐𝟕𝟓−𝟐𝟐𝟓 𝟓𝟎 𝟕𝟓𝟓−𝟔𝟐𝟓 𝟏𝟑𝟎
r = √𝒃𝒚𝒙 ∗ 𝒃𝒙𝒚
r = √𝟏. 𝟑 ∗ 𝟎. 𝟓
r=0.80
0.8=0.80

Q. For the data given below, show that

rxy= ruv; where u= X-69, v=Y-112

X Y XY X2 Y2 U V U2 V2 UV
78 125 9750 6084 15625 9 13 81 169 117
89 137 12193 7921 18769 20 25 400 625 500
97 156 15132 9409 24336 28 44 784 1936 1232
69 112 7728 4761 12544 0 0 0 0 0
59 107 6313 3481 11449 -10 -5 100 25 50
79 136 10744 6241 18496 10 24 100 576 240
68 123 8364 4624 15129 -1 11 1 121 11
61 108 6588 3721 11664 -8 -4 64 16 32
TOT 600 1004 76812 46242 128012 48 108 1530 3468 2160
AL

𝒏Ʃ𝒙𝒚 −(Ʃ𝒙)(Ʃ𝒚) 𝒏Ʃ𝒖𝒗 −(Ʃ𝒖)(Ʃ𝒗)


rxy = ruv =
√[ 𝒏Ʃ𝒙𝟐 −( Ʃ𝒙)𝟐 ] [ 𝒏Ʃ𝒚𝟐 −(Ʃ𝒚)𝟐 ] √[ 𝒏Ʃ𝒖𝟐 −( Ʃ𝒖)𝟐 ] [ 𝒏Ʃ𝒗𝟐 −(Ʃ𝒗)𝟐 ]

𝟖(𝟕𝟔𝟖𝟏𝟐) −(𝟔𝟎𝟎)(𝟏𝟎𝟎𝟒) 𝟖(𝟐𝟏𝟔𝟎) −(𝟒𝟖)(𝟏𝟎𝟖)


r= ruv =
√[ 𝟖(𝟒𝟔𝟐𝟒𝟐)−( 𝟔𝟎)𝟐 ] [ 𝟖(𝟏𝟐𝟖𝟎𝟏𝟐)−(𝟏𝟎𝟎𝟒)𝟐 ] √[ 𝟖(𝟏𝟓𝟑𝟎)−( 𝟒𝟖)𝟐 ] [ 𝟖(𝟑𝟒𝟔𝟖)−(𝟏𝟎𝟖)𝟐 ]

r = 0.96 ruv = 0.96


Q. Find the rank correlation.

Rank S
a b X Y d= X-Y d2
7.4 8.5 3 2 1 1
9.0 6.1 2 4 -2 4
11.0 2.4 1 6 -5 25
2.5 6.7 6 3 3 9
4.6 12.6 5 1 4 16
6.5 3.3 4 5 -1 1
Ʃd2=56

𝟔Ʃ𝒅𝟐
rs = 1 - 𝒏(𝒏𝟐 − 𝟏)

𝟔(𝟓𝟔)
rs = 1 - 𝟔(𝟔𝟐 − 𝟏)

𝟑𝟑𝟔
rs = 1 - 𝟐𝟏𝟎

rs = -0.60
Rank Correlation for tied values
Q. Find rank Correlation

Ranks
X Y a b d= a-b D2
50 90 𝟑+𝟒 8 -4.5 20.25
= 3.5
𝟐
43 95 5 𝟔+𝟕 -1.5 2.25
= 6.5
𝟐
50 112 3.5 4 -0.5 0.25
40 120 𝟔+𝟕+𝟖 𝟐+𝟑 4.5 20.25
=7 = 2.5
𝟑 𝟐
60 95 1 6.5 -5.5 30.25
40 170 7 1 6 36
40 100 7 5 2 4
55 120 2 2.5 -0.5 0.25
Ʃd2=113.5

𝟏 𝟑
T= Ʃ (𝒕 𝟐 – t)
𝟏𝟐
𝟏
T = 𝟏𝟐 [(𝟐𝟑 – 2) + (𝟑𝟑 – 3) + (𝟐𝟑 – 2) + (𝟐𝟑 – 2)]
𝟏
T = 𝟏𝟐 [ 6+24+6+6]
T = 3.5

Adj Ʃ𝒅𝟐𝟐 = Ʃ𝒅𝟐𝟐 + T = 113.5 + 3.5 = 117


𝟔 Ʃ 𝑨𝒅𝒋 Ʃ𝒅𝟐𝟐
rs = 1 - 𝒏( 𝒏𝟐 −𝟏)
𝟔 (𝟏𝟏𝟕)
rs = 1 - 𝟖 (𝟖𝟐−𝟏)
rs= 0.35

You might also like