You are on page 1of 10

Regression: In the case of regression, we express the relation between 2 variables in the form

of a cause- effect relationship which can be written as a linear equation

Y = a + b X where Y is called as the dependent variable, X is called as the


independent variable and a and b are called as regression constants. This is called as the
simple linear regression equation of Y on X. This equation can be used for forecasting or
predicting the value of the dependent variable Y for some given value of the independent
variable X.

Example, Y = 1 + 2 X

For some given values of X and Y, we can have many lines drawn through them, but there
will be only one line which is the closest to these points and this is called as the best fit line.
The values of a & b can be found by using the method of least squares. In this method, we
try to minimise the value of ∑e2 where e is the difference between the Y coordinates of the
point plotted and the point on the straight line.

The formulae for a and b can be written as

b = ∑xy/ ∑x2 , a = Y bar – b* X bar

These values of a & b can be substituted in the equation Y = a + b X and this equation can be
used to forecast the value of Y for some given value of X.

Q1. For the following data, find the simple linear regression equation of Y on X and forecast
Y when X = 20.

X Y x = X –X bar y = Y – Y bar xy x2 y2 Y cap e= Y -Ycap e2


5 12 -4 -5 20 16 25 13.48 -1.48
7 15 -2 -2 4 4 4 15.23 -0.23
8 17 -1 0 0 1 0 16.11 0.89
10 20 1 3 3 1 9 17.88 2.12
15 21 6 4 24 36 16 22.27 -1.27
Total of e2=9.13
X bar = 45/5 = 9, Y bar = 85/5 = 17

∑xy = 51, ∑x2= 58, ∑y2= 54


b = ∑xy/ ∑x2 = 51/58 = 0.8793, a = Y bar – b* X bar = 17-(0.8793)(9)= 9.086

Hence the simple linear regression equation of Y on X is Y = 9.086 + 0.8793 X

Putting X =20, we get Y = 9.086 + 0.8793(20) = 26.672

Q3. Calculate the regression of Y on X. Forecast the value of Y when X =25

X : 10 12 15 23 20

Y : 14 17 23 25 21

Solution: X bar = ∑X /n = 80/5 = 16, Y bar = ∑Y /n = 100/5 = 20

x = X –X bar: -6 -4 -1 7 4

y = Y –Y bar: -6 -3 3 5 1

xy : 36 12 -3 35 4

x2 : 36 16 1 49 16

y2 : 36 9 9 25 1

∑xy = 84, ∑ x2 = 118, ∑y2 = 80

b = ∑xy / ∑ x2 = 84 / 118 = 0.71, a = Y bar – b X bar = 20 – (0.71) (16) = 8.64

Simple linear regression equation of Y on X is

Y = a + b X i.e. Y = 8.64 + 0.71 X

When X =25, Y = 8.64 + 0.71(25)= 26.39


Q4. Find Karl Pearson’s coefficient of correlation and the equation of the simple linear
regression line for the following data. Forecast Y when X is 10.
X 9 8 7 6 5 4 3 2 1
Y 15 16 14 13 11 12 10 8 9
Solution:
X Y x = X –X bar y = Y – Y bar xy x2 y2
9 15 4 3 12 16 9
8 16 3 4 12 9 16
7 14 2 2 4 4 4
6 13 1 1 1 1 1
5 11 0 -1 0 0 1
4 12 -1 0 0 1 0
3 10 -2 -2 4 4 4
2 8 -3 -4 12 9 16
1 9 -4 -3 12 16 9
X bar = 45/9 = 5, Y bar = 108/9= 12

∑xy = 57, ∑ x2 = 60, ∑y2 = 60

b = ∑xy / ∑ x2 = 57 / 60 = 0.95, a = Y bar – b X bar = 12 – (0.95) (5) = 7.25

Simple linear regression equation of Y on X is

Y = a + b X i.e. Y = 7.25 + 0.95 X

When X =10, Y = 7.25 + 0.95(10) = 16.75

r = ∑xy/ sqrt (∑x2 * ∑y2) = 57/ √60 x 60 = 0.95


This means that there is a very high positive correlation between X and Y.

Q5. Following are the average prices of a particular stock and the values of Stock Exchange
index for 6 years:

Stock price X (Rs.) Index Y


245 307
255 322
240 337
390 310
655 350
393 360
Calculate the coefficient of correlation between the share price and the SE index. Also find
the simple linear regression equation of Y on X.

Solution:
X Y x = X –X bar y=Y–Y xy x2 y2 Y cap e
e2
245 307 -118 -24 2832 13924 576 322.74 -15.74
255 322 -108 -9 972 11664 81 323.44 -1.44
240 337 -123 6 -738 15129 36 322.39 14.61
390 310 27 -21 -567 729 441 332.89 -22.89
655 350 292 19 5548 85264 361 351.44 -1.44
393 360 30 29 870 900 841 333.1 26.9
X bar =2178/6 = 363, Y bar = 1986/6 = 331

∑xy = 8917, ∑ x2 = 127610, ∑y2 = 2336, e = Y – Y cap, Total of e square = 1712

b = ∑xy / ∑ x2 = 8917 / 127610 = 0.07, a = Y bar – b X bar = 331 – (0.07) (363) = 305.59

Simple linear regression equation of Y on X is

Y = a + b X i.e. Y = 305.59 + 0.07 X

r = ∑xy/ sqrt (∑x2 * ∑y2) = 8917/ √127610 x 2336 = 8910/(357.22 x 48.33) = 0.516

This means that there is a moderate positive correlation between X and Y.

Q6. Find Karl Pearson’s coefficient of correlation and the equation of the simple linear
regression line for the following data.
Age X 56 42 36 47 49 42 60 68
Blood pressure Y 147 125 118 128 145 140 155 162

Solution:
X Y x = X –X bar y = Y – Y bar xy x2 y2
56 147 6 7 42 36 49
42 125 -8 -15 120 64 225
36 118 -14 -22 308 196 484
47 128 -3 -12 36 9 144
49 145 -1 5 -5 1 25
42 140 -8 0 0 64 0
60 155 10 15 150 100 225
68 162 18 22 396 324 484
X bar = 400/8 = 50, Y bar = 1120/8 = 140

∑xy = 1047, ∑ x2 = 794, ∑y2 = 1636

b = ∑xy / ∑ x2 = 1047 / 794 = 1.319, a = Y bar – b X bar = 140 – (1.319) (50) = 74.05

Simple linear regression equation of Y on X is

Y = a + b X i.e. Y = 74.05 + 1.319 X

r = ∑xy/ sqrt (∑x2 * ∑y2) = 1047/ √794 x 1636 = 1047/(28.2 x 40.45) = 0.918

This means that there is a high positive correlation between X and Y.

Q7. A research project was undertaken to determine if there is a relationship between the
years of experience on the job (X) and efficiency rating of employees (Y). The objective of
the study was to predict the efficiency rating of the employee. The sample results are as
follows:

Years of Job (X) 1 20 6 8 2 1 14 8 4 6

Efficiency rating (Y) 6 5 3 5 2 2 4 5 4 4

(a) Find the correlation coefficient between X and Y.

(b) Find the linear regression of Y on X.

Solution:

X Y x = X –X bar y = Y – Y bar xy x2 y2
1 6 -6 2 -12 36 4

20 5 13 1 13 169 1

6 3 -1 -1 1 1 1
8 5 1 1 1 1 1

2 2 -5 -2 10 25 4

1 2 -6 -2 12 36 4

14 4 7 0 0 49 0

8 5 1 1 1 1 1

4 4 -3 0 0 9 0

6 4 -1 0 0 1 0

X bar = 70/10 = 7, Y bar = 40/10 = 4

∑xy = 26, ∑ x2 = 328, ∑y2 = 16

b = ∑xy / ∑ x2 = 26 / 328 = 0.079, a = Y bar – b X bar = 4 – (0.079) (7) = 3.447

Simple linear regression equation of Y on X is

Y = a1 + b X i.e. Y = 3.447 + 0.079 X

r = ∑xy/ sqrt (∑x2 * ∑y2) = 26/ √328 x 16 = 26/(18.12 x 4) = 0.36

This means that there is a low positive correlation between X and Y.

Q8. Quinine may be determined by measuring the fluorescence intensity in IM sulphuric


acid. Standard solutions of quinine gave the following fluorescence values. Calculate the
correlation coefficient.

Concentration of quinine Y 0.00 0.10 0.20 0.30 0.40

Fluorescence intensity X 0.00 5.2 9.80 12.30 17.10

If the intensity was observed to be 14.85 what is the concentration of quinine Y likely to be in
the solution.

Solution:

X Y x = X –X bar y = Y – Y bar xy x2 y2
0 0 -9 -0.2 1.8 81 0.04

5.2 0.1 -3.8 -0.1 0.38 14.44 0.01

9.8 0.2 0.8 0 0 0.64 0

12.3 0.3 3.3 0.1 0.33 10.89 0.01

17.7 0.4 8.7 0.2 1.74 75.69 0.04

X bar =45/5 = 9, Y bar = 1/5 = 0.2

∑xy = 4.25, ∑ x2 = 182.66, ∑y2 = 0.1

b = ∑xy / ∑ x2 = 4.25 / 182.66 = 0.023

a = Y bar – byx X bar = 0.2 – (0.023) (9) = -0.007

Simple linear regression equation of Y on X is

Y = a + b X i.e. Y = -0.007 + 0.023 X

Put X = 14.85, Y = -0.007 + (0.023) (14.85) = 0.33455

r = ∑xy/ sqrt (∑x2 * ∑y2) = 4.25/ √182.66 x 0.1 = 4.25/ √18.266 = 4.25/4.27 = 0.995

This means that there is a very high positive correlation between X and Y.

Q9. The manufacturers of a particular brand of chocolate were interested in examining the
relationship between the sales of chocolates and shelf space allocated to that brand of
chocolate by various stores. Data from 10 stores are as follows:

Sales ( Rs in thousands) 25 15 28 30 17 16 12 21 19 27
Y

Shelf Space (sq ft) X 5 3.2 5.4 6.1 4.3 3. 2.6 6.4 4.9 6
1
Determine the regression to predict sales using shelf space as the independent variable. Also
find the Karl Pearson’s correlation coefficient between X and Y.

Solution:

X Y x = X –X bar y = Y – Y bar xy x2 y2
5 25 0.3 4 1.2 0.09 16
3.2 15 -1.5 -6 9 2.25 36
5.4 28 0.7 7 4.9 0.49 49
6.1 30 1.4 9 12.6 1.96 81
4.3 17 -0.4 -4 1.6 0.16 16
3.1 16 -1.6 -5 8 2.56 25
2.6 12 -2.1 -9 18.9 4.41 81
6.4 21 1.7 0 0 2.89 0
4.9 19 0.2 -2 -0.4 0.04 4
6 27 1.3 6 7.8 1.69 36
Y bar = 210/10 = 21, X bar = 47/10 = 4.7

∑ xy= 63.6, ∑x2 =16.54, ∑y2= 344

b = ∑ xy / ∑x2 = 63.6/ 16.54 = 3.845, a = Y bar – b X bar = 21-(3.845)(4.7)= 2.9285

Equation is Y = a + b X i.e. Y = 2.9285 + 3.845 X

r = ∑xy/ sqrt (∑x2 * ∑y2) = 63.6/ √344 x 16.54 = 63.6/ √5689.76 = 63.6/75.44 = 0.843

This means that there is a high positive correlation between X and Y.

Multiple linear regression


Here the dependent variable Y depends on more than one independent variable. It is of the
form
Y = a + b1X1 + b2X2 +….bnXn
Where X1, X2, … are the independent variables.
For getting the values of a, b1, b2… we consider three normal equations
∑Y = na + b1∑X1 + b2 ∑X2
∑X1Y= a∑X1 + b1∑X12 + b2∑X1X2
∑X2Y= a∑X2 + b1∑X1X2 + b2∑X22
Consider the following example:

The data below shows the profit (in Rs.’000), sales (in Rs. Lakhs) and advertising
expenditure(in Rs.’00). Find the multiple regression equation of profit on sales and
advertising expenditure.

Sales(X1) Advertising expenditure (X2) Profit(Y) X12 X1X2 X22 X1Y X2Y
24 16 10 576 384 256 240 160
35 17 11 1225 595 289 385 187
38 18 12 1444 684 324 456 216
41 19 13 1681 779 361 533 247
42 20 14 1764 840 400 588 280
∑X1= 180, ∑X2= 90, ∑Y = 60, ∑X1X2=3282, ∑X2 2 =1630, ∑X12 =6690, ∑X1Y=2202,
∑X2Y= 1090, n= 5

Substituting in the above equations, we get

60= 5a + 180 b1 + 90 b2

2202= 180a + 6690 b1 + 3282 b2

1090= 90a + 3282 b1 + 1630 b2

Solving the above equations, we will get the values of a, b1 and b2 which we substitute in the
equation

Y = a + b1X1 + b2X2
Y = 8.8 + 0.089 X1 + 0.49 X2

You might also like