Professional Documents
Culture Documents
Regression:
The dependence of one variable (dependent), on one or more than one variable is
called as regression.
It is used for estimating or predicting the average value of dependent variable from
the known values of independent variable.
Simple regression:
When we study the dependence of a variable on a single independent variable, it is
called as simple regression.
Example:
1. Consumption depends on income.
Here the consumption is dependent, whereas income is independent variable.
Multiple Regressions:
The dependent of one variable on more than one independent variable is called as
multiple regressions.
Example:
Yields of wheat depend upon fertilizers, seed, sand etc.
Deterministic Relation:
When there exists exact relationship between two variables, then it is called as
deterministic relation.
Example:
𝟗
F= 32 + c
𝟓
Probabilistic relation:
When the relationship is inexact, that is called as probabilistic relation.
Example:
Dependence of wheat production on different factors is an example of probabilistic
relation.
Scatter diagram:
For checking whether or not a relationship between two variables is linear or not,
scatter diagram is used. For constructing this diagram, pairs of (X 2, Y2) are
considered. On X-axis we take X (independent variable) and on Y-axis we take Y
(dependent variable).
4 2
6 3
8 4
10 5
Scatter Diagram
6
3
Scatter Diagram
0
2 4 6 8 10
Equation of Regression:
Formula:
𝒏Ʃ𝑿𝒀−(Ʃ𝑿)(Ʃ𝒀)
b=
𝒏Ʃ𝑿𝟐 −(Ʃ𝑿)𝟐
And
̅ - b𝑿
a= 𝒀 ̅ =Ʃ𝒀, 𝑿
̅ [𝒀 ̅ =Ʃ𝑿]
𝒏 𝒏
Where,
𝑦̂= a + bX
Y= dependent variable
X= independent variable
a= Constant or intercept
b= regression co-efficient or slope of line
Example Table:
X Y XY X2
TOTAL
Q. For the data given below, find regression equation of Y on X.
X Y XY X2 Y2
1 1 1 1 1
2 4 8 4 16
3 9 27 9 81
4 16 64 16 256
6 18 108 36 324
TOTAL 16 48 208 66 678
𝒏Ʃ𝑿𝒀−(Ʃ𝑿)(Ʃ𝒀)
b=
𝒏Ʃ𝑿𝟐 −(Ʃ𝑿)𝟐
𝟓(𝟐𝟎𝟖)−(𝟏𝟔)(𝟒𝟖)
b=
𝟓(𝟔𝟔)−(𝟏𝟔)𝟐
𝟏𝟎𝟒𝟎 −𝟕𝟔𝟖
b=
𝟑𝟑𝟎 − 𝟐𝟓𝟔
b= 3.675
̅ - b𝑿
a= 𝒀 ̅
̅ = Ʃ𝒀 =
𝒀
𝟒𝟖
= 𝟗. 𝟔
𝒏 𝟓
̅ = Ʃ𝑿 = 𝟏𝟔 = 3.2
𝑿
𝒏 𝟓
̂= a + bX
𝒚 ̅ = Ʃ𝒀 = 𝟒𝟑𝟓.𝟖
Where
𝒀 = 𝟓𝟒. 𝟒𝟕𝟓
𝒏 𝟖
𝒏Ʃ𝑿𝒀−(Ʃ𝑿)(Ʃ𝒀)
b=
𝒏Ʃ𝑿𝟐 −(Ʃ𝑿)𝟐 ̅ = Ʃ𝑿 = 𝟗𝟖.𝟏 = 12.2625
𝟖(𝟓𝟔𝟓𝟎.𝟐𝟗)−(𝟗𝟖.𝟏)(𝟒𝟑𝟓.𝟖)
𝑿
𝒏 𝟖
b= ̅ - b𝑿
̅
𝟖(𝟏𝟐𝟗𝟗.𝟖𝟓)−(𝟗𝟖.𝟏)𝟐 a= 𝒀
b= 3.16 a= 54.475 – 3.16 (12.2625)
a= 15.73
̂= a + bX: 𝒚
Regression equation: 𝒚 ̂= 15.73 + 3.16X
Scatter Diagram
90
80
70
60
50
40
30
20
10
0
0 2 4 6 8 10 12 14 16 18 20
X (x-axis) Y (y-axis)
1 ( 12.9, 62.5 )
2. ( 7.2, 28.7 )
3. ( 11.3, 52.2 )
4. ( 18.6, 80.6 )
5. ( 8.8, 41.6 )
6. ( 10.3, 71.3 )
7. ( 15.9, 54.4 )
8. ( 13.1, 44.5 )
Where
Y= Wheat Production (y-axis)
X= Rainfall (x-axis)
Standard deviation of regression or Standard error of estimate
Ʃ𝒚𝟐 −𝒂Ʃ𝒚−𝒃Ʃ𝒙𝒚
Sy.x= √
𝒏−𝟐
Activity: for the previous question, find the standard error of estimate.
CO-EFFICIENT OF DETERMINATION
(Ʃ𝒚)𝟐
𝒂Ʃ𝒚+𝒃Ʃ𝒙𝒚−
𝒏
R2= (Ʃ𝒚)𝟐
𝟐
Ʃ𝒚 −
𝒏
Practice:
Find r2 for previous data.
Correlation:
The inter-dependence between two variables is called as correlation.
Example: the relationship between gold prices and oil prices.
Positive Correlation:
Two variables are said to be positively Correlated if tend to increase or
decrease in the same direction.
Example: The length of iron bar will increase if the temperature
increases.
Negative Correlation:
When one variable increases and other decreases then it is called as
negative correlation.
Example: Number of Corona patients will decrease by increased stay at
home or in isolation.
Postive Correlation
30
25
20
15 Postive
10 Correlation
5
0
2 4 6 8 10
Negative Correlation
30
25
20
15 Negative
10 Correlation
5
0
2 4 6 8 10
Correlation is measured by “r” the correlation co-efficient.
𝒏Ʃ𝒙𝒚 −(Ʃ𝒙)(Ʃ𝒚)
r=
√[ 𝒏Ʃ𝒙𝟐 −( Ʃ𝒙)𝟐 ] [ 𝒏Ʃ𝒚𝟐 −(Ʃ𝒚)𝟐 ]
x y xy X2 Y2
TOTAL
X Y X2 Y2 XY
1 2 1 4 2
2 5 4 25 10
3 3 9 9 9
4 8 16 64 32
5 7 25 49 35
TOTAL 15 25 55 151 88
𝒏Ʃ𝒙𝒚 −(Ʃ𝒙)(Ʃ𝒚)
r=
√[ 𝒏Ʃ𝒙𝟐 −( Ʃ𝒙)𝟐 ] [ 𝒏Ʃ𝒚𝟐 −(Ʃ𝒚)𝟐 ]
𝟓(𝟖𝟖) −(𝟏𝟓)(𝟐𝟓)
r=
√[ 𝟓(𝟓𝟓−( 𝟏𝟓)𝟐 ] [ 𝟓(𝟏𝟓𝟏)−(𝟐𝟓)𝟐 ]
𝟏𝟑
r= = 0.8
√𝟏 𝟎 ∗ 𝟐𝟔
Properties of correlation co-efficient
i. “r” is symmetrical with respect to “x” and “y”.
rxy= ryx .
ii. It lies between -1 and +1.
iii. It is independent of origin and scale, rxy= ruv.
iv. “r” is the geometric mean between two regressions co-efficient.
r = √𝑏𝑦𝑥 ∗ 𝑏𝑥𝑦
where
𝒏Ʃ𝒙𝒚 −(Ʃ𝒙)(Ʃ𝒚)
byx =
𝒏Ʃ𝒙𝟐 − (Ʃ𝒙)𝟐
𝒏Ʃ𝒙𝒚 −(Ʃ𝒙)(Ʃ𝒚)
bxy =
𝒏Ʃ𝒚𝟐 −(Ʃ𝒚)𝟐
Q. For the data.
X 1 2 3 4 5
Y 2 5 3 8 7
Solution:
we already know that in this question
n=5, ƩX=15, ƩY=25, ƩX2=55, ƩY2=151, ƩXY=88, r = 0.80
𝟒𝟒𝟎−𝟑𝟕𝟓 𝟔𝟓 𝟔𝟓 𝟔𝟓
byx = = = 1.3 bxy = = = 0.5
𝟐𝟕𝟓−𝟐𝟐𝟓 𝟓𝟎 𝟕𝟓𝟓−𝟔𝟐𝟓 𝟏𝟑𝟎
r = √𝒃𝒚𝒙 ∗ 𝒃𝒙𝒚
r = √𝟏. 𝟑 ∗ 𝟎. 𝟓
r=0.80
0.8=0.80
X Y XY X2 Y2 U V U2 V2 UV
78 125 9750 6084 15625 9 13 81 169 117
89 137 12193 7921 18769 20 25 400 625 500
97 156 15132 9409 24336 28 44 784 1936 1232
69 112 7728 4761 12544 0 0 0 0 0
59 107 6313 3481 11449 -10 -5 100 25 50
79 136 10744 6241 18496 10 24 100 576 240
68 123 8364 4624 15129 -1 11 1 121 11
61 108 6588 3721 11664 -8 -4 64 16 32
TOT 600 1004 76812 46242 128012 48 108 1530 3468 2160
AL
Rank S
a b X Y d= X-Y d2
7.4 8.5 3 2 1 1
9.0 6.1 2 4 -2 4
11.0 2.4 1 6 -5 25
2.5 6.7 6 3 3 9
4.6 12.6 5 1 4 16
6.5 3.3 4 5 -1 1
Ʃd2=56
𝟔Ʃ𝒅𝟐
rs = 1 - 𝒏(𝒏𝟐 − 𝟏)
𝟔(𝟓𝟔)
rs = 1 - 𝟔(𝟔𝟐 − 𝟏)
𝟑𝟑𝟔
rs = 1 - 𝟐𝟏𝟎
rs = -0.60
Rank Correlation for tied values
Q. Find rank Correlation
Ranks
X Y a b d= a-b D2
50 90 𝟑+𝟒 8 -4.5 20.25
= 3.5
𝟐
43 95 5 𝟔+𝟕 -1.5 2.25
= 6.5
𝟐
50 112 3.5 4 -0.5 0.25
40 120 𝟔+𝟕+𝟖 𝟐+𝟑 4.5 20.25
=7 = 2.5
𝟑 𝟐
60 95 1 6.5 -5.5 30.25
40 170 7 1 6 36
40 100 7 5 2 4
55 120 2 2.5 -0.5 0.25
Ʃd2=113.5
𝟏 𝟑
T= Ʃ (𝒕 𝟐 – t)
𝟏𝟐
𝟏
T = 𝟏𝟐 [(𝟐𝟑 – 2) + (𝟑𝟑 – 3) + (𝟐𝟑 – 2) + (𝟐𝟑 – 2)]
𝟏
T = 𝟏𝟐 [ 6+24+6+6]
T = 3.5