You are on page 1of 14

Plane of Regression

UNIT10 PLANE OF REGRESSION


Structure
10.1 Introduction
Objectives
10.2 Yule’s Notation
10.3 Plane of Regression for three Variables
10.4 Properties of Residuals
10.5 Variance of the Residuals
10.6 Summary
10.7 Solutions / Answers

10.1 INTRODUCTION

In Unit 9, you learnt the concept of regression, linear regression, lines of


regression and regression coefficient with its properties. Unit 9 was based on
linear regression in which we were considering only two variables. In Unit 6
of MST-002, you studied the correlation that measures the linear relationship
between two variables. In many situations, we are interested in studying the
relationship among more than two variables in which one variable is
influenced by many others. For example, production of any crop depends
upon soil fertility, fertilizers used, irrigation methods, weather conditions,
etc. Similarly, marks obtained by students in exam may depend upon their
IQ, attendance in class, study at home, etc. In these type of situations, we
study the multiple and partial correlations. In this unit you will study the
basics of multiple and partial correlations that includes Yule’s notations,
plane of regression, residuals, properties of residuals and variance of
residuals.
For the study of more than two variables, there would be need of much more
notations in comparison to the notations used in Unit 9. These notations were
given by G.U. Yule (1897). Yule’s notations and residuals are described in
Section 10.2. Plane of regression and normal equations are given in Section
10.3. Properties of residuals are explored in Section 10.4 whereas Section
10.5 explains the variance of residuals.
Objectives
After reading this unit, you would be able to
 define the Yule’s notation;
 describe the plane of regression for three variable;
 explain the properties of residuals;
 describe the variance of the residuals;
 find out the lines of regression; and
 find out the estimates of dependent variable from the regression lines.
23
Regression and Multiple
Correlation
10.2 YULE’S NOTATION
Karl Pearson developed the theory of multiple and partial correlation for
three variables and it was generalized by G.U. Yule (1897). Let us consider
three random variables X1 , X 2 and X3 and X1 , X 2 and X 3 are their respective
means. Then regression equation of X1 on X 2 and X 3 is defined as
X1  a  b12 . 3 X 2  b13 . 2 X 3 … (1)

where, b12.3 and b13.2 are the partial regression coefficients of X 1 on X 2 and
X1 on X 3 keeping the effect of X3 and X2 fixed respectively.
Taking summation of equation (1) and dividing it by N, we get
X1  a  b12 . 3 X 2  b13 . 2 X 3 … (2)

On subtracting equation (2) from equation (1), we get


X1  X1  b12 . 3 X 2  X 2   b13 . 2 X 3  X3 
… (3)
Suppose X1  X1  x1 , X 2  X 2  x 2 and X 3  X 3  x 3

Now, plane of regression x1 on x 2 and x 3 (equation (3)) can be rewritten as

x 1  b12 . 3 x 2  b13.2 x 3 … (4)

Right hand side of equation (2) is called the estimate of x 1 which is denoted
by x1.23 . Thus,

x 1.23  b12.3 x 2  b13.2 x 3 … (5)

Error of estimate or residual is defined as


e1.23  x1  x1.23
e1.23  x1  b12.3 x 2  b13.2 x 3 … (6)
This residual is of order 2.
If we are considering n variables x 1 , x 2 ,..., x n , the equation of the plane of
regression of x1 on x 2 ,..., x n is
x 1  b12.3456...n x 2  b13.24...n x 3  ...  b1n .23...( n 1) x n … (7)

and error of estimate or residual is


x1 . 23...n  x1  ( b12 . 34...n x 2  b13 .24...n x 3  ...  b1n .23...(n 1) x n ) … (8)

Note: In above expressions we have used subscripts involving digits 1, 2,


3,…, n and dot (.). Subscripts before the dot are known as the primary
subscripts whereas the subscripts after the dot are called secondary
subscripts.
The number of secondary subscripts decides the order of regression
coefficient.
24
Plane of Regression
For example b12.3 is the regression coefficient of order 1, b12.34 is of order 2
and so on b1n.23...( n 1) of order (n-2).

Order in which secondary subscripts ( b 12 . 34 or b12 . 43 ) is immaterial but the


order of primary subscripts is very important and decides the dependent and
independent variables. In b 12 . 34 , x 1 is dependent variable and x 2 is
independent variable whereas in b 21.34 , x2 is dependent variable and x 1 is
independent variable.
Order of residuals is determined by the number of secondary subscripts. For
example x1. 23 is residual of order 2 while as x 1.234 is of order 3. Similarly,
x1.234....n is residual of order (n-1).

10.3 PLANE OF REGRESSION FOR THREE


VARIABLES
Let us consider three variables x 1 , x 2 and x 3 measured from their respective
means. The regression equation of x 1 depends upon x 2 and x 3 is given by
(from equation (4)).
x1  b12.3 x 2  b13.2 x 3 … (9)

If x 3 is considered as a constant then the graph of x 1 and x 2 is a straight


line with slope b12.3 , similarly the graph of the x 1 and x 3 would be the
straight line with slope b13.2 , if x 2 is considered as a constant.
According to the principle of least squares, we have to determine constants
b12.3 and b 13.2 in such a way that sum of squares of residuals is minimum i.e.
U   (x1  b12.3 x 2  b13.2 x 3 ) 2   e12.23 is minimum.

here, e1.23  x1  b12.3 x 2  b13.2 x 3 … (10)


By the principle of maxima and minima, we take partial derivatives of U
with respect to b12.3 and b13.2
Thus,
U U
 0
 b12.3  b13.2
Let us take
U
0
 b12.3

  2(x1  b12.3 x 2  b13.2 x 3 )( x 2 )  0

  x 2 (x1  b12.3 x 2  b13.2 x 3 )  0 … (11)


25
Regression and Multiple   (x 2 x 1  b12.3 x 22  b13.2 x 2 x 3 )  0
Correlation
 b 12.3  x 22  b 13.2  x 2 x 3   x 1 x 2  0
… (12)
Similarly,
U
 0   x 3 (x1  b12.3 x 2  b13.2 x 3 )  0 … (13)
b 13.2

 b12.3  x 2 x 3  b13.2  x 32   x 1 x 3  0
… (14)
As we know that
1
 i2   (x i  x i ) 2 (for i  1, 2 and 3)
N
1 2
  x i2  x i
N

Since, x 1 , x 2 and x 3 are measured from their means therefore


x1  x 2  x3  0 then
1
 i2   x i2 … (15)
N
Similarly, we can write (for i ≠ j = 1, 2, 3)
1
Cov ( x i , x j )   x i x j … (16)
N
and consequently, using equations (15) and (16), correlation coefficient
between x i and x j can be expressed as

rij 
Cov( x i , x j )

x x i j
 Cov(x i , x j )  rij i  j … (17)
V( x i ) V( x j ) N i  j

Dividing equations (12) and (14) by N provides


1 1 1
b12.3  x 22  b13.2  x 2 x 3   x 1x 2  0 and … (18)
N N N
1 1 1
b12.3  x 2 x 3  b13.2  x 32   x 1x 3  0 … (19)
N N N
Using equations (15), (16) and (17) in equations (18) and (19), we have
b12.3 22  b13.2 Cov( x 2 , x 3 )  Cov(x1 , x 2 )  0 From equation (18)
 b12.3 22  b13.2 r23 2 3  r12 1 2  0

  2 ( b12.3  2  b13.2 r23 3  r12 1 )  0

 b12.3  2  b13.2 r23 3  r12 1  0 … (20)


Similarly,
b12.3Cov( x 2 , x 3 )  b13.2 σ 23  Cov(x1 , x 3 )  0 From equation (19)
26
Plane of Regression
 b12.3 r23  2 3  b13.2 32  r131 3  0

  3 ( b12.3r23 2  b13.2  3  r131 )  0

 b12.3 r23 2  b13.2  3  r13 1  0 … (21)

where, r12 is the total correlation coefficient between x 1 and x 2 , r13 is the
total correlation coefficient between x 1 and x 3 and similarly, r23 is the total
correlation coefficient between x 1 and x 3 . Thus, we have two equations (20)
and (21).
Solving the equations (20) and (21) for b12.3 and b13.2 , we obtained

r12 1 r23 3
r131 3
b12.3 
2 r23 3
r23 2 3

r12 r23
 r 1 1 r12  r13r23 
b12.3  1 13  … (22)
2 1 r23 
 2 1  r232 
r23 1
Similarly,
1 r12
 r r13 1 r13  r12r23 
b13.2  1 23  … (23)
3 1 r23 
 3 1  r232 
r23 1
If we write
1 r12 r13
t  r21 1 r23 … (24)
r31 r32 1

b12.3 and b13.2 can be written as


1 t12
b12.3  
 2 t11
and
1 t13
b13.2  
 3 t11

where, tij is the cofactor of the element in the i th row and jth column of t.
Substituting the values of b12.3 and b13.2 in equation (9), we get the equation
of the plane of regression of x 1 on x 2 and x 3 as
27
Regression and Multiple 1 t12  t
Correlation x1   x 2  1 13 x 3 where, a = 0
2 t11 3 t11
x1 x x
 t11  2 t12  3 t13  0 … (25)
1 2 3
Similarly, the plane of regression of x 2 on x1 and x 3 is given by
x1 x x
 t 21  2 t 22  3 t 23  0 … (26)
1 2 3
and the plane of regression of x 3 on x1 and x 2 is
x1 x x
 t 31  2 t 32  3 t 33  0 … (27)
1 2 3
In general the plane of regression of xi on the remaining variable xj
( j  i  1, 2, ..., n ) is given by
x x x x
 1 t i1  2 t i 2  ...  i t ii  ...  n t in  0; i  1,2,..., n … (28)
σ1 σ2 σi σn

Example 1: From the given data in the following table find out
(i) Least square regression equation of X1 on X 2 and X 3 .

(ii) Estimate the value of X1 for X 2  45 and X3  8 .

X1 1 2 3 4 5

X2 3 4 5 6 7

X3 4 5 6 7 8

Solution: (i) Here X1, X2 and X3 are three random variables with their
respective means X1 , X 2 and X3 .

Let X1  X1  x1 , X 2  X 2  x 2 and X 3  X 3  x 3

Then linear regression equation of x1 on x 2 and x 3 is


x1  b12.3 x 2  b13.2 x 3
From equation (22) and (23), we have
1 r12  r13 r23 
b12.3 

 2 1  r232 
and,
1 r13  r12 r23 
b13.2 

 3 1  r232 
The value of σ1, σ2, σ3, r12 r13 and r23 can be obtained through some
calculations given in the following table:

28
Plane of Regression
S. x1= x2= x3= 2 2 2
X1 X2 X3 (x1) (x2) (x3) x1x2 x1x3 x2x3
No. X1−5 X 2−6 X3−7

1 1 3 4 −4 −3 −3 16 9 9 12 12 9

2 3 5 5 −2 −1 −2 4 1 4 2 4 2

3 4 6 6 −1 0 −1 1 0 1 0 1 0

4 7 7 9 2 1 2 4 1 4 2 4 2

5 10 9 11 5 3 4 25 9 16 15 20 12

Total 25 30 35 0 0 0 50 20 34 31 41 25

X1 
X 1

25
5
N 5

X2 
X 2

30
6
N 5

X3 
X 3

35
7
N 5
1
12   x 12 from equation (15)
N
1
 50  10
5
 1  10
 3.162
1
 22   x 22
N
1
 20  4
5
 2  4  2
1
 32   x 32
N
1
 34  6.8
5
  3  6.8
 2.608

r12 
x x 1 2
from equation (17)
N1 2
29
Regression and Multiple 31
Correlation   0.98
5  3.162  2

r13 
x x 1 3

N1 3
41
  0.994
5  3.162  2.608

r23 
x x 2 3

N 2 3
25
  0.959
5  2  2.608
Now, we have
1 r12  r13r23 
b12.3 

 2 1  r232 
3.162  0.98  0.994  0.959
  0.527

2  1  0.959
2

1 r13  r12 r23 
b13.2 

 3 1  r232 
3.162  0.994  0.98  0.959
  0.818

2.608  1  0.959 
2

Thus, regression equation of x1 on x2 and x3 is
x1  5.276x 2  0.818 x 3
After substituting the value of x1, x2 and x3, we will get the following
regression equation of X1 on X2 and X3 is
X1  5  0.527 X 2  6  0.818 X 3  7 
 X1  3.891  5.276 X 2  0.818 X 3

(ii) Substituting X 2  45 and X 3  8 in regression equation


 X1  3.891  5.276 X 2  0.818 X 3

we get estimated value of X1 i.e. X1  26.38


Let us solve some exercises.

E1) For the data given in the following table find out
(i) Regression equation of X1 on X 2 and X 3 .

(ii) Estimate the value of the value of X1 for X 2  6 and X3  8 .

30
Plane of Regression
X1 2 6 8 10

X2 4 5 9 12

X3 4 6 10 12

10.4 PROPERTIES OF RESIDUALS


Property 1: The sum of the product of a variate and a residual is zero if the
subscript of the variate occurs among the secondary subscripts of the
residual, i.e.  x 2 e1.23  0. Here, subscript of the variate x i.e. 2 is appearing
in the second subscripts of the e1.23 .

Proof: If the regression equation of x 1 on x 2 and x 3 is

x1  b12.3 x 2  b13.2 x 3
Here, x 1 , x 2 and x 3 are measured from their respective means.
Using equation (10) in equations (11) and (13) we have following normal
equations

x e 2 1.23  0 =  x 3e1.23
Similarly, normal equation for regression lines of x 2 on x1 and x 3 & x 3
on x 2 and x1 are

x e 1 2.13  0 =  x 3e 2.13

x e 2 3.12  0 =  x1e3.12

Property 2: The sum of the product of two residuals is zero provided all the
subscripts, primary as well as secondary, are appearing among the secondary
subscripts of second residual i.e.  x 3.2 e1.23  0 , since primary as well as
secondary subscripts (3 and 2) of the first residual is appearing among the
secondary subscripts of the second residual.
Proof: We have

x e
3.2 1.23   (x 3  b32 x 2 )e1.23

  ( x 3e1.23  b 32 x 2e1.23 ) = 0

(From Property 1: x e 3 1.23  0 and  x 2 e1.23  0 )

Similarly,

x e
2.3 1.23 0
Property 3: The sum of the product of any two residuals is unaltered if all
the secondary subscript of the first occur among the secondary subscripts of
the second and we omit any or all of the secondary subscripts of the first. 31
Regression and Multiple Proof: We have
Correlation
x e
1.2 1.23  x e
1 1.23 ,

i.e. Right hand side and left hand side are equal even

x e
1.2 1.23   ( x1  b12 x 2 )e1.23

  ( x1e1.23  b12 x 2e1.23 )

=  x1e1.23

(From Property 1,  x 2 e1.23  0 )

Now let us do some little exercises.


E2) Show that x e 2 1.23  0.

E3) Show that  x e 3 1.23 0.

10.5 VARIANCE OF THE RESIDUALS


Let x1, x 2 and x 3 be three random variables then plane of regression of x1 on
x2 and x3 is defined as
x 1  a  b12 . 3 x 2  b13 . 2 x 3

Since x 1 x 2 and x 3 are measured from their respective means so

 x  x  x
1 2 3 0 and we get a = 0 and regression equation becomes

x1  b12.3 x 2  b13.2 x 3

and error of the estimate or residual is (See Section 10.2)


e1. 23  x 1  b12.3 x 2  b13.2 x 3

Now the variance of the residual is denoted by 12.23 and defined as


1
12.23   (e1.23  e1.23 ) 2
N
e1.23  x1  b12 . 3 x 2  b13 . 2 x 3  0 because  x  x  x
1 2 3 0

and
1
σ12.23   e12.23 ...(29)
N
1

N
 (x 1  b12.3 x 2  b13.2 x 3 )(x 1  b12.3 x 2  b13.2 x 3 )
1

N
 x 1 (x 1  b12.3 x 2  b13.2 x 3 )
32
Plane of Regression
1

N
 b12.3 x 2 (x1  b12.3 x 2  b13.2 x 3 )
1

N
 b13.2 x 3 ( x1  b12.3 x 2  b13.2 x 3 )
1
12.23   ( x 12  b12.3 x 1 x 2  b13.2 x 1 x 3 )
N
1

N
 (b12.3 x 2 x1  b122 .3 x 22  b12.3b13.2 x 2 x 3 )
1

N
 (b 13.2 x 3 x 1  b13.2 b 12.3 x 3 x 2  b132 .2 x 32 )
We know that
b 12.3  x 22  b13.2  x 2 x 3   x 1 x 2  0

and
b 12.3  x 2 x 3 b13.2  x 32   x 1 x 3  0

(see equations (12) and (14) of Section 10.3)


Therefore,
1 1 1
12.23   x12  b12.3  x 1x 2  b13.2  x1 x 3
N N N
12.23  12  b1.23 r12 1 2  b13.2 r13 1  3

r12  r13 r23 2 r r r


12.23  12  2
.1 r12  13 122 23 .12 r13
1  r23 1  r23

12
12.23  2
(1  r232  r122  r132  2 r12 r23r13 )
1  r23 … (30)

10.6 SUMMARY
In this unit, we have discussed:
1. The Yule’s notation for trivariate distribution;
2. The plane of regression for trivariate distribution;
3. How to get normal equations for the regression equation of
x 1 on x 2 and x 3 ;
4. The properties of residuals;
5. The variance of residuals; and
6. How to find the estimates of dependent variable of regression
equations of three variables.
33
Regression and Multiple
Correlation

10.7 SOLUTIONS / ANSWERS

E1) (i) Here X1, X2 and X3 are three random variables with their
respective means X1 , X 2 and X 3 .

Let X1  X1  x 1 , X 2  X 2  x 2 and X 3  X 3  x 3

Then linear regression equation of x1 on x 2 and x 3 is

x1  b12.3 x 2  b13.2 x 3
From equation (22) and (23), we have
 r  r r 
b12.3  1 12 132 23
 2 1  r23 
and
 r  r r 
b13.2  1 13 122 23
 3 1  r23 
σ1, σ2, σ3, r12 r13 and r23 can be obtained through the following table.

S. No. X1 X2 X3 x1 = x2= x3= (x1)2 (x2)2 (x3)2 x1x2 x1x3 x2x3


X 1−6.5 X2 −7.5 X3−8

1 2 4 4 −4.5 −3.5 −4 20.25 12.25 16 15.75 18 14

2 6 5 6 −0.5 −2.5 −2 0.25 6.25 4 1.25 1 5

3 8 9 10 1.5 1.5 2 2.25 2.25 4 2.25 3 3

4 10 12 12 3.5 4.5 4 12.25 20.25 16 15.75 14 18

Total 26 30 32 0 0 0 35 41 40 35 36 40

X1 
X 1

26
 6 .5
N 4

X2 
X 2

30
 7.5
N 4

X3 
X 3

32
8
N 4
1
12   x12 from equation (15)
N
1
 35  8.75
4
 1  35  2.958
34
Plane of Regression
1
 22   x 22
N
1
 41  10.25
4
  2  10.25  3.202
1
 32   x 32
N
1
 40  10
4
  3  10  3.162

r12 
x x 1 2
from equation (17)
N1 2
35
  0.924
4  2.958  3.202

r13 
x x 1 3

N1 3
36
  0.962
4  2.958  3.162

r23 
x x 2 3

N 2 3
40
  0.988
4  3.202  3.162
Now, we have
1 r12  r13r23 
b12.3 

 2 1  r232 
2.958  0.924  0.962  0.988
  1

3.202  1  0.988
2

1 r13  r12 r23 
b13.2 

 3 1  r232 
2.958  0.962  0.924  0.988
  1.9

3.162  1  0.988
2

Thus, regression equation of x1 on x2 and x3 is
x 1   x 2  1 .9 x 3
After substituting the value of x1, x2 and x3, we will get the following
regression equation of X1on X2 and X3 is 35
Regression and Multiple X1  6.5  X 2  7.5  1.9X 3  8
Correlation
 X1  1.2  X 2  1.9 X3
(ii) Substituting X 2  6 and X 3  8 in regression equation
 X1  1.2  X 2  1.9 X 3
we get estimated value of X1 i.e. X1  8

E2) Hint: According to the property 1: x e 2 1.23  0 , since subscript of


the variate x2 i.e. 2 is appearing in the second subscript of e1.23 i.e. in
23.
.
E3) Hint: According to the property 1: x e 3 1.23  0 , since subscript of
the variate x3 i.e. 3 is appearing in the second subscript of e1.23 i.e. in
23.

36

You might also like