You are on page 1of 16

Statistical Methods

Correlation and Regression

By
Govinda Gyawali
Regression Analysis

It is the average mathematical relationship between dependent and


independent variable in terms of original unit of data.
It is also known as predicting or forecasting equation:
Dependent Variable: The variable value which is to be estimated or
predicted known as dependent variable.
Independent variable: The variable value which is given for estimation
is known as independent variable.
Multiple Regression
The regression line in which two or more independent variables are
used is known as multiple regression equation.
Let us consider regression line Y on X1 and X2
Y= a + b1 X1 + b2 X2 + e ………………….(i)
Where Y= dependent variable, X1=independent variable, X2=
independent variable, a= Y-intercept, b1 and b2= regression
coefficients ,e= random error.
Contd…………
The values of parameters a , b1 and b2 can be obtained by using
normal equations ( obtained from principle of least square method),
the normal equations are
∑Y= na + b1 ∑X1 +b2∑X2 …………………….(ii)
∑X1 Y= a∑X1 + b1∑X12 + b2 ∑X1 X2 ………………(iii)
∑X2 Y= a∑X2 + b1∑ X1 X2 +b2∑X22 ……………….(iv)
Solving normal equations (ii), (iii) and (iv) we get the values of
parameters a, b1 and b2 . substituting these values in equation (i) we
get required regression line Y on X1 and X2.
Coefficient of Multiple Determination (R2)

It explains amount of variation in dependent variable explained by


variation in independent variables.

Its value ranges from 0 to 1.


Standard Error Estimate( )
It measures the amount of variation of Y around its
fitted regression line.
=

=
Multiple Regression
The following table represents data on annual food expenditure annual,
income and family size for a sample of 10 families.
Annual 24 8 16 18 24 23 11 15 21 20
food
expnd.(in
‘00’)
Annual 11 3 4 7 9 8 5 7 8 7
Income
(000)
Family 6 2 1 3 5 4 2 2 3 2
Size
Contd………………………
i. Calculate the least square equation to predict annual food expenditures
from annual income and family size.
ii. Predict the value of annual food expenditures for family having
annual income Rs.18000 and family size is 5.
iii. Interpretation of the regression coefficients.
iv. Calculate coefficient of multiple determination and multiple
correlation coefficient. Also interpret the result.
v. Calculate standard error of estimate and interpret the result.
Solution
Let Y= Annual food, X1= annual income, X2= family size
Consider a regression line Yon X1 and X2
Y=a+b1X1+b2X2 ………………………..(i)
The values of parameters a , b1 and b2 can be obtained by using normal
equations ( obtained from principle of least square method), the normal
equations are
∑Y= na + b1 ∑X1 +b2∑X2 …………………….(ii)
∑X1 Y= a∑X1 + b1∑X12 + b2 ∑X1 X2 ………………(iii)
∑X2 Y= a∑X2 + b1∑ X1 X2 +b2∑X22 ……………….(iv)
Contd…………………..
Y X1 X2 X1Y X2 Y X1X2 X12 X22 Y2

24 11 6 264 144 66 121 36 576

8 3 2 24 16 6 9 4 64

16 4 1 64 16 4 16 1 256

18 7 3 126 54 21 49 9 324

24 9 5 216 120 45 81 25 576

23 8 4 184 92 32 64 16 529

11 5 2 55 22 10 25 4 121

15 7 2 105 30 14 49 4 225

21 8 3 168 63 24 64 9 441

20 7 2 140 40 14 49 4 400

∑X1Y ∑X2 Y σ 𝑿𝟏 𝑿 𝟐 ∑X12 ∑X22 ∑Y2 =


∑Y=180 ∑X1= 69 ∑X2= 30 =1346 =597 =236 =527 =112 3512
Contd…………
The above normal equations can be written as
180 = 10a+69b1+30b2……… ..…………(v)
1346=69a+527b1+236b2…………………..(vi)
597=30a+236b1+112b2…………………..(vii)
Multiplying equation (v) by 69 and(vi) by 10 and subtracting we get
12420=690a+4761 b1 + 2070b2
Multiplying equation (v) by 69 and(vi) by 10 and
13460=690a +5270b1+2360b2 subtracting we get
- - - - 180 = 10a+69b1+30b2 ]×69
-1040 = - 509 b1 – 290b2 1346=69a+527b1+236b2 ]×10
- - - -
1040 = 509 b1 +290b2 ………………..(viii)
-1040 = - 509 b1 – 290b2
1040 = 509 b1 +290b2 ………………..(viii)
Contd…………..
Multiplying equation (v) by 3 and subtracting with (vii) and we get
540=30a + 207b1+ 900b2
Multiplying equation (v) by 3 and subtracting with
597=30a +236b1 +112b2 (vii) and we get

- - - - 180 = 10a+69b1+30b2 ]×3


-57 = -29b1 – 22b2 597=30a+236b1+112b2
57 = 29b1 + 22b2 …………….(ix) - - - -

-57 = -29b1 – 22b2


57 = 29b1 + 22b2 …………….(ix)
Contd………….
Multiplying equation (viii) by 22 and equation (ix) by 290 and subtracting, we get
1040 = 509 b1 +290b2 ]×22
57 = 29b1 + 22b2 ]×290
- - -
6350 = 2788b1
6350
b1 = =2.27
2788
Putting the value of b1 in equation (ix) we get
57 = 29×2.27 +22b2
-22b2 = 65.83-57 = 8.83
8.83
b2 = - = - 0.40
22
Contd………..
Substituting the value of b and b in equation (v) we get
1 2
ann180 = 10a +69×2.27 + 30×(-0.40)
a = 3.54
Substituting the values of a, b1, and b2 in equation (v) we get the required regression line
෡ = 3.54 + 2.27X1 – 0.40X2
𝒀
ii)When anual income (in ‘000’), X1 = 18 (in ‘000’) ie. 18000 and family size (X2) = 5 ,the
estimated value of annual food expenditures is
𝒀෡ = 3.54 + 2.27×18 – 0.40×5 = 42.4 (in Rs. ‘00’) = Rs. 4240
iii) b1 = 2.27 ie. it means the value of Y is increased by 2.27 per unit change in X1 keeping the
effect of X2 constant.
b2 = -0.40 ie. it means on an average the value of Y is decreased by 0.40 per unit change in
X2 keeping the effect of X1 constant.
Contd….
iv) Coefficient of multiple determination (R2):
𝑎σ𝑌+𝑏1 σ𝑋1 𝑌+𝑏2 σ𝑋2 𝑌−𝑛𝑌ത 2 3.54×180+2.27×1346+ −0.40 ×597−10×182
= =
σ𝑌 2 −𝑛𝑌ത 2 3512−10×182
213.82
= = 0.7861 ie 78.61%
272

It means 78.61% of total variation in Y is explained by variation in


independent variables X1 and X2 and remaining variation (21.39%) is
explained by other factors.
Coefficient of multiple correlation:
= Coefficient of multiple determination (𝑅2 )
= 𝟎. 𝟕𝟖𝟔𝟏
= 0.8866, positive high degree correlation
Contd……………………..
Standard error of estimate (𝑺𝒀𝑿 )
σ𝒀𝟐 −𝒂σ𝐘−𝐛𝟏 σ𝐗 𝟏 𝐘−𝐛𝟐 σ𝐗 𝟐 𝐘 σ𝒀𝟐 −𝒂σ𝐘−𝐛𝟏 σ𝐗 𝟏 𝐘−𝐛𝟐 σ𝐗 𝟐 𝐘
= =
𝒏−𝒌−𝟏 𝒏−𝟑
𝟑𝟓𝟏𝟐−𝟑.𝟓𝟒×𝟏𝟖𝟎−𝟐.𝟐𝟕×𝟏𝟑𝟒𝟔−(−𝟎.𝟒𝟎)×𝟓𝟗𝟕
=
𝟏𝟎−𝟑
= 2.88

It means average variation of Y around its fitted regression line is 2.88

You might also like