Professional Documents
Culture Documents
If X and Y is a two dimensional R.V uniformly distributed over the triangular region R bounded
4x
by y 0, x 3 and y . Find f (x), f ( y ), E (X ), Var(X), E (Y ), XY .
3
Solution:
Given X and Y are uniformly distributed .
Therefore, f ( x, y) k (a cons tan t )
We know that, f ( x, y)dxdy 1
4 3
That is, kdxdy 1
0 3y
4
4
k [ x]33 y dy 1
0
4
4 3y
k 3 dy 1
0 4
1
6k 1 k
6
3 3 1 1
f ( y ) f ( x, y )dx = dx = (4 y),0 y 4
3y 3y 6 8
4 4
4x
3 1 2
f ( x) dy = x,0 x 3
0 6 9
32
E ( X ) xf ( x)dx = x 2 dx = 2
09
4y 4
E (Y ) yf ( y )dy = (4 y )dy
08 3
9
E ( X 2 ) x 2 f ( x)dx
2
8
E (Y 2 ) y 2 f ( y )dy
3
1
Var( X ) E ( X 2 ) [ E ( X )]2 =
2
8
Var(Y ) E (Y 2 ) [ E (Y )]2 =
9
14 3
E ( XY ) xydxdy = 3
6 0 3y
4
E ( XY ) E ( X ) E (Y ) 1
Now, XY =
X . Y 2
Correlation:
If the change in one variable affects a change in the other variable, the variables are said to be
correlated.
Correlation between variables gives the degree of relationship between them.
Example: 1.The correlation between the heights and weights of a group of persons.
2. income and expenditure and so on
Problems:
1. Calculate the correlation coefficient for the following heights (in inches) of fathers X
and their sons Y.
X 65 66 67 67 68 69 70 72
Y 67 68 65 68 72 72 69 71
Solution:
X Y XY X2 Y2
65 67 4355 4225 4489
66 68 4488 4356 4624
67 65 4355 4489 4225
67 68 4556 4489 4624
68 72 4896 4624 5184
69 72 4968 4761 5184
70 69 4830 4900 4761
72 71 5112 5184 5041
544 552 37560 37028 38132
544 552
Now, X 68 , Y 69 , XY (68)(69) 4692
8 8
1 37028
X 2
X X =
2
4624 = 2.121
n 8
1 38132
Y 2
Y Y =
2
4761 = 2.345
n 8
1 1
XY XY 37560 4692
r( X ,Y ) n = 8 = 0.6030
X . Y 2.121 2.345
X 10 14 18 22 26 30
Y 18 12 24 6 30 36
3. Let X, Y and Z be uncorrelated random variables with zero means and standard
deviations 5, 12 and 9 respectively. If U = X + Y and V = Y + Z, find the correlation
coefficient between U and V.
Solution:
Given that all the three random variables have zero mean.
Hence, E(X) = E(Y) = E(Z) = 0.
Now, Var(X) = E ( X 2 ) [ E ( X )]2
E ( X 2 ) = Var(X) { since, E(X) = 0}
= 52 = 25
Similarly, E (Y 2 ) = 12 2 = 144 and E ( Z 2 ) = 9 2 = 81
To find (U ,V ) :
E (UV ) E (U ).E (V )
Now, (U , V )
U . V
E (U 2 ) E[( X Y ) 2 ] = E [ X 2 ] E [Y 2 ] 2E [ XY ]
= 25 + 144 + 0
= 169
Similarly, E (V 2 ) 225
Solution:
1
1 1
2 x3 4a
E ( X ) xf ( x)dx = x 4axdx = 4a x dx = 4a =
0 0 3 0 3
1
1 1
2 y3 4b
E (Y ) yf ( y )dy = y 4bydy = 4b y dy = 4b =
0 0 3 0 3
Since X and Y are independent, the joint pdf of X and Y is given by f ( x, y) f ( x). f ( y)
= (4ax)(4by)
= 16abxy, 0 x 1, 0 y 1
11
Now, E ( XY ) xyf ( x, y )dxdy = xy (16 abxy)dxdy
00
11 1 x3
= 16 ab x 2 y 2 dxdy = 16ab y 2 dy
00 0 3
=
16 ab 1 2
3 0
y dy =
16ab
9
16ab 4a 4b
Therefore, Cov(X,Y) = E(XY) – E(X)E(Y) = - = 0
9 3 3
( X ,Y ) 0
Rank Correlation:
Rank 1 2 3 4 5 6 7
in X
Rank 4 3 1 2 6 5 7
in Y
Solution:
Rank in X 1 2 3 4 5 6 7
( xi )
Rank in Y 4 3 1 2 6 5 7
( yi )
d i ( xi yi ) -3 -1 2 2 -1 1 0 0
d i2 9 1 4 4 1 1 0 20
If any two or more individuals are equal in any classification w.r.to characteristic a and B or
if there is more than one item with same value in the series then Spearman’s formula for
calculating the rank correlation coefficient breaks down. In this case common ranks are given
to the repeated ranks. This common rank is the average of the ranks which these items would
have assumed if they are slightly different from each other and the next item will get the rank
next to the ranks already assumed.
As a result of this correction is made in the correction formula.
m(m 2 1)
In the correction formula, we add the factor to d 2 where m is the number of
12
items an item is repeated.
X 68 64 75 50 64 80 75 40 55 64
Y 62 58 68 45 81 60 68 48 50 70
Solution:
X 68 64 75 50 64 80 75 40 55 64
Y 62 58 68 45 81 60 68 48 50 70
Rank X ( xi ) 4 6 2.5 9 6 1 2.5 10 8 6
Rank Y ( yi ) 5 7 3.5 10 1 6 3.5 9 8 2
d i ( xi yi ) -1 -1 -1 -1 5 -5 -1 1 0 4
d i2 1 1 1 1 25 25 1 1 0 16
Correction factors:
2(2 2 1) 1
In X series, 75 repeated twice, C.F .
12 2
3(3 1)
2
In X series, 64 repeated thrice, C.F . 2
12
2(2 2 1) 1
In Y series, 68 repeated twice, C.F .
12 2
1 1
6( d 2 2 )
Therefore, rank correlation r 1 2 2
2
10(10 1)
6[72 0.5 2 0.5] 450
1 = 1 0.5454
10[99] 990
The basic distinction between multiple and partial correlation nalysis is that in the
former, the degree of relationship between the variable Y and all the other variables
X 1 , X 2 ,..., X n taken together is measired, whereas, in the latter, the degree of relationship
between Y and one of the variables X 1 , X 2 ,..., X n is measured by removing the effect of all
the other variables.
Partial correlation:
If there are three variables X 1 , X 2 and X 3 , there will be three coefficients of partial
correlation, each studying the realtionship between two variables when the third is held
constant. If we denote by r12.3, that is, the coefficient of partial correlation X1 and X 2
keeping X 3 constant, it is calculated as
r12 r13 r23 r13 r12 r23 r23 r12 r13
r12.3 , r13.2 , r23.1
1 r132 1 r23
2
1 r122 1 r23
2
1 r122 1 r132
1. In a trivariate distribution, it is found that r12 0.7 , r13 0.61 and r23 0.4 .
Find the partial correlation coefficients.
Solution:
Multiple corelation:
In multiple correlation, we are trying to make estimates of the value of one of the variable
based on the values of all the others. The variable whose value we are trying to estimate is
called the dependent variable and the other variables on which our estimates are based are
known as independent variables.
The coefficient of multiple correlation with three variables X 1 , X 2 and X 3 are
R1.23, R2.13 and R3.21 . R1.23, is the coefficient of multiple correlation related to X1 as a
dependent variable and X 2 , X 3 as two independent variables and it can be expressed interms
of r12, r23 and r13 as
r122 r132 2r12 r23 r13 r122 r23
2
2r12 r23 r13
R1.23 , R2.13 ,
1 r23
2
1 r132
r132 r23
2
2r12 r23 r13
R3.12
1 r122
Solution:
r122 r132 2r12 r23 r13
R1.23
1 r23
2
Regression:
Lines of regression:
2. The line of regression of Y on X is given by y y r. Y ( x x )
X
3. The line of regression of X on Y is given by x x r. X ( y y )
Y
Regression Coefficients:
2. Regression coefficient of Y on X : r. Y bYX
X
( x x )( y y )
Where bYX
2
(x x)
3. Regression coefficient of X on Y : r. X b XY
Y
( x x )( y y )
Where b XY
2
( y y)
4. Correlation coefficient r b XY bYX
1. From the following data find (i) two regression equations (ii) the coefficient of
correlation between the marks in Economics and Statistics (iii) the most likely marks in
Statistics when marks in Economics are 30.
Marks in Economics 25 28 35 32 31 36 29 38 34 32
Marks in Statistics 43 46 49 41 36 32 31 30 33 39
Solution:
X Y XX Y Y ( X X )2 (Y Y ) 2 (X X )
= X 32 = Y 38 (Y Y )
25 43 -7 5 49 25 -35
28 46 -4 8 16 64 -32
35 49 3 11 9 121 33
32 41 0 3 0 9 0
31 36 -1 -2 1 4 2
36 32 4 -6 16 36 -24
29 31 -3 -7 9 49 21
38 30 6 -8 36 64 -48
34 33 2 -5 4 25 -10
32 39 0 1 0 1 0
320 380 0 0 140 398 -93
X 320 Y 380
Here, X 32 , Y 38
n 10 n 10
Solution:
Since both the lines of regression passes through the mean values x and y , the point ( x , y )
must satisfy the two given regression lines .
8 x 10 y 66 …………..(1)
40 x 18 y 214 …………..(2)
1 47526
X 2 2
X X = 4225 = 22.97
n 10
1 45784
Y 2 2
Y Y = 4356 = 14.91
n 10
1 1
XY XY 45456 4290
r( X ,Y ) n = 10 = 0.746
X . Y 22.97 14.91
( X 65) (Y 66)
1 5276
X 2
(X X ) 22.97
n 10
1 2224
Y 2
(Y Y ) 14.91
n 10
Cov( x, Y ) 255 .6
r ( X ,Y ) 0.746
X . Y (22.97)(14.91)
6. If X and Y are independent random variables with means 5 and 10 and standard
deviations 2 and 3 respectively. Obtain r (U ,V ) where U 3 X 4Y and V 3 X Y .
E[9 X 2 9 XY 4Y 2 ] (55)(5)
2 x y; 0 x 1; 0 y 1
f ( x, y )
0 ; otherwise
1
Show that r ( X , Y ) .
11
E ( XY ) E ( X ).E (Y )
Solution: We know that, r ( X , Y )
X . Y
11
Now, E ( XY ) xyf ( x, y )dxdy = xy (2 x y )dxdy
00
1
1 x3 y x2 y 2
= x2 y dy
0 3 2
0
1 y y2
= y dy
0 3 2
1
=
6
1
11 11 1 x3 x 2 y
E ( X ) xf ( x, y )dxdy = x(2 x y )dxdy = x 2 dy
00 00 0 3 2
0
1 1 y 5
1 dy
0 3 2 12
11 11 5
E (Y ) yf ( x, y )dxdy y (2 x y )dxdy
00 00 12
1
2
11 11 1 2 x3 x 4 x3 y
E( X ) x 2 f ( x, y )dxdy x 2 (2 x y )dxdy = dy
00 00 0 3 4 3
0
1 2 1 y 1
dy
0 3 4 3 4
11 11 1
E (Y 2 ) y 2 f ( x, y )dxdy y 2 (2 x y )dxdy
00 00 4
2
2 1 5 2 11
Var( X ) E ( X ) [ E ( X )]
4 12 144
1 5 5
.
E ( XY ) E ( X ).E (Y ) 1
Therefore, r ( X , Y ) = 6 12 12
X . Y 11 11 11
.
12 12
Rank Correlation Coefficient:
In real life, there are situations when we get data in the form of ranks or otherwise the
original data are rated with different grades. For example, if two inspectors are asked to grade
the units produced by a machine, then we may have two different sets of grades (ranks). If two
sets of observations of a quality characteristic are given to an inspector to rank them, we may
get a pair of ranks for each pair of observations based on their performance. Under these
circumstances, we may have to obtain the correlation between the two sets of ranks instead of
using the observations as it is.
If 1,2,..., n are the ranks given based on the outcomes of the random variable X or the
ranks given to the n values ( x1, x2 ,..., xn ) of X and also 1,2,..., n are the ranks given based on
the outcomes of the random variable Y or the ranks given to the n values ( y1, y2 ,..., yn ) of Y
, then the correlation coefficient between X and Y, known as Spearman’s rank correlation
coefficient, is given by
n
6 di2
r ( X ,Y ) 1 i 1
2
n(n 1)
where di Rank of the ith value of X (i.e. xi ) – Rank of the ith value of Y (i.e. yi )
That is, d i ( xi yi ) .
Note: If one or more of the ranks are repeated within a variable, then the following formula is
suggested:
n 1 1
6 di2 mx (mx2 1) m y (m2y 1)
i 1 12 x 12 y
r ( X ,Y ) 1 2
n(n 1)
where mx is the number of times a value repeated in variable X and m y is the number of
times a value repeated in variable Y .
A 3 5 8 4 7 10 2 1 6 9
B 6 4 9 8 1 2 3 10 5 7
Solution:
A ( xi ) B ( yi ) d i ( xi yi ) di2
3 6 -3 9
5 4 1 1
8 9 -1 1
4 8 -4 16
7 1 6 36
10 2 8 64
2 3 -1 1
1 10 -9 81
6 5 1 1
9 7 2 4
0 di
2
214
n
6 di2
i 1 6(214)
The rank correlation coefficient is r ( X , Y ) 1 2
1
n(n 1) 10(10 2 1)
0.297
2. The marks secured by recruits in the selection test X and in the proficiency test Y
are given below:
Serial No. 1 2 3 4 5 6 7 8 9
X 10 15 12 17 13 16 24 14 22
Y 30 42 45 46 33 34 40 35 39
Solution:
10 30 9 9 0 0
15 42 5 3 2 4
12 45 8 2 6 36
17 46 3 1 2 4
13 33 7 8 -1 1
16 34 4 7 -3 9
24 40 1 4 -3 9
14 35 6 6 0 0
22 39 2 5 -3 9
2
di 72
n
6 di2
i 1 6(72)
The rank correlation coefficient is r ( X , Y ) 1 2
1 0.4
n(n 1) 9(92 1)
Competitors
1 2 3 4 5 6 7 8 9 10
X 6 5 3 10 2 4 9 7 8 1
Judges Y 5 8 4 7 10 2 1 6 9 3
Z 4 9 8 1 2 3 10 5 7 6
Discuss which pair of judges has the nearest approach to common tests of beauty.
Solution:
X Y Z d1 x y d12 d2 x z d22 d2 x z d32
6 5 4 1 1 2 4 1 1
5 8 9 -3 9 -4 16 -1 1
3 4 8 -1 1 -5 25 -4 16
10 7 1 3 9 9 81 6 36
2 10 2 -8 64 0 0 8 64
4 2 3 2 4 1 1 -1 1
9 1 10 8 64 -1 1 -9 81
7 6 5 1 1 2 4 1 1
8 9 7 -1 1 1 1 2 4
1 3 6 -2 4 -5 25 -3 9
158 158 214
6 d12 6(158)
r ( X ,Y ) 1 2
1 0.042
n(n 1) 9(92 1)
6 d 22 6(158)
r( X , Z ) 1 2
1 0.042
n(n 1) 9(92 1)
6 d32 6(214)
r (Y , Z ) 1 1 0.296
n(n 2 1) 9(92 1)
Therefore, ( X , Y ) and ( X , Z ) have the nearest approach to common tastes of beauty.
4. The following table gives the number of units rejected by two operators X and Y in 8
inspections:
X 15 20 28 12 40 60 20 80
Y 40 30 50 30 20 10 30 60
Obtain rank correlation coefficient between X and Y with respect to the quality of
the product.
Solution:
X Y Ranks in X ( xi ) Ranks in Y ( yi ) di ( xi yi ) di2
15 40 2 6 -4 16
20 30 3.5 4 -0.5 0.25
28 50 5 7 -2 4
12 30 1 4 -3 9
40 20 6 2 4 16
60 10 7 1 6 36
20 30 3.5 4 -0.5 0.25
80 60 8 8 0 0
2
di 81.5
m(m2 1) 2(22 1) 1
In X series 20 repeated twice, correction factor
12 12 2
m(m2 1) 3(32 1)
In Y series 30 repeated thrice, correction factor 2
12 12
1
681.5 2
Therefore, r ( X , Y ) 1 0
2
2
8(8 1)
Here, since the correlation coefficient is 0, we conclude that there is no relationship
between the quality of product X and that of Y.
Exercise
X 10 14 18 22 26 30
Y 18 12 24 6 30 36
Solution: r = 0.6
X 21 25 26 24 22 30 19 24 28 32 31 29 21 18
Y 19 20 24 21 21 24 18 22 19 30 27 26 19 18
To know the extent relationship between the age of husbands ( X ) and wives (Y ) ,
e x ; 0 y x
3. Let the joint p.d.f. of X and Y be given by f ( x, y)
0 ; otherwise
Find the correlation coefficient between X and Y.
1
Solution: r ( X , Y )
2
4. Let the random variables X and Y have the joint probability density function
x y; 0 x 1; 0 y 1
f ( x, y )
0 ; otherwise
Compute the correlation coefficient between X and Y.
1
Solution: r ( X , Y )
11
5. In a marketing survey the price of tea and coffee in a town based on quality was
found as shown below. Could you find any relation between tea and co ffee price.
Price of tea 88 90 95 70 60 75 50
Price of coffee 120 134 150 115 110 140 100
Solution: r = 0.8929. The relation between price of tea and coffee is positive.
6. Find the rank correlation for tied observations. Following are the marks obtained by
10 students in a class in two tests.
Students A B C D E F G H I J
Test 1 70 68 67 55 60 60 75 63 60 72
Test 2 65 65 80 60 68 58 75 63 60 70
Solution: r = 0.68.
Regression
Regression is a mathematical measure of the average relationship between two or more variable
in terms of the original units of data.
Regression Equations
A regression line can be represented by an algebraic expression which gives the relationship
between the two variables. There are two regression equations:
1. The equation which gives the best mean values of X corresponding to given values of
Y , that is, the regression equation of X on Y is X X r. x (Y Y ) .
y
2. The equation which gives the best mean values of Y corresponding to given values of
y
X , that is, the regression equation of Y on X is Y Y r. (X X )
x
where X and Y are the means of X and Y ; x and y are the standard deviations of
Regression Coefficients
y
5. Regression coefficient of Y on X : r. byx
x
(X X )(Y Y )
where byx
(X X )2
6. Regression coefficient of X on Y : r. x bxy
y
(X X )(Y Y )
where bxy
(Y Y )2
y
If the equations of lines of regression of Y on X and X on Y are Y Y r. (X X )
x
and X X r. x (Y Y ) .
y
1 r 2 y x
Then the angle ' ' between the two lines of regression is given by tan
r x2 2y
Test Scores X 16 22 28 24 29 25 16 23 24
Sales (’00 Rs) Y 35 42 57 40 54 51 34 47 45
The sales Y of any salesmean are considered to depend on his ability is judged by his
test scores X .
Solution: The regression line of Y on X can be fitted to the data in the following
manner.
x 207 y 405
X 23 , Y 45
n 9 n 9
X Y X X Y Y ( X X )2 (Y Y )2 ( X X )(Y Y )
16 35 -7 -10 49 100 70
22 42 -1 -3 1 9 3
28 57 5 12 25 144 60
24 40 1 -5 1 25 -5
29 54 6 9 36 81 54
25 51 2 6 4 36 12
16 34 -7 -11 49 121 77
23 47 0 2 0 4 0
24 45 1 0 1 0 0
207 405 0 0 166 520 271
y
Hence, the regression equation of Y on X is Y Y r. (X X )
x
(Y 45) 1.63( X 23)
Y 7.51 1.63 X
X 1 2 3 4 5 6 7 8 9
Y 9 8 10 12 11 13 14 16 15
(i) Obtain the regression equations and the coefficient of correlation.
(ii) Determine an estimate of Y which should correspond on the average to X 6.2 .
Solution: The regression line of Y on X can be fitted to the data in the following
manner.
x 45 y 108
X 5, Y 12
n 9 n 9
X Y X X Y Y ( X X )2 (Y Y )2 ( X X )(Y Y )
1 9 -4 -3 16 9 12
2 8 -3 -4 9 16 12
3 10 -2 -2 4 4 4
4 12 -1 0 1 0 0
5 11 0 -1 0 1 0
6 13 1 1 1 1 1
7 14 2 2 4 4 4
8 16 3 4 9 16 12
9 15 4 3 16 9 12
45 108 0 0 60 60 57
( X X )(Y Y ) 57
(i) Coefficient of regression of X on Y is r. x bxy 0.95
y (Y Y )
2 60
Hence, the regression equation of X on Y is X X r. x (Y Y )
y
( X 5) 0.95(Y 12)
X 0.95Y 6.4
y (X X )(Y Y ) 57
Coefficient of regression of Y on X is r. byx 0.95
x (X X )2 60
y
Hence, the regression equation of Y on X is Y Y r. (X X )
x
(Y 12) 0.95( X 5)
Y 0.95 X 7.25
0.95 0.95
0.95
13.14 13
X Y 6 0 --------------------- (1)
0.64 X 4.08 0 -------------------- (2)
4.08
From (2), X 6.375
0.64
From (1), X Y 6
6.375 Y 6
Y 0.375
If the number of independent variables in a regression model is more than one, then the
model is called as multiple regression. In fact, many of the real-world applications demand the
use of multiple regression models.
A sample application is as stated below:
Y b0 b1 X1 b2 X 2 b3 X 3 b4 X 4
where Y represents the economic growth rate of a country, X1 represents the time period, X 2
represents the size of the populations of the country, X 3 represents the level of employment
in percentage, X 4 represents the percentage of literacy, b0 is the intercept and b1, b2 , b3 and b4
are the slopes of the variables X1, X 2 , X 3 and X 4 respectively. In this regression model,
X1, X 2 , X 3 and X 4 are the independent variables and Y is the dependent variable.
Y nb0 b1 X1 b2 X 2
YX1 b0 X1 b1 X12 b2 X1 X 2
YX 2 b0 X 2 b1 X1 X 2 b2 X 22
where n is the total number of combinations of observations. The solution to the above set of
simultaneous equations will form the results for the coefficients b0 , b1 and b2 of the regression
model.
Example 1: The annual sales revenue (in crores of rupees) of a product as a function of sales
force (number of salesmen) and annual advertising expenditure (in lakhs of rupees) for the past
10 years are summarized in the following table.
Annual advertising 28 23 38 16 20 28 23 30 26 32
expenditures X 2
Solution: Let the regression model be Y b0 b1 X1 b2 X 2
where Y is the annual sales revenue; X1 is the sales force; X 2 is the annual advertising
expenditures.
Y X1 X2 X 12 X 22 X1 X 2 YX1 YX 2
The solution to the above set of simultaneous equation is b0 5.1483, b1 0.6190 and
b2 0.4304 .
Exercise:
1. Following table gives the data on rainfall and discharge in a certain river. Obtain the line
of regression of Y on X .
Rainfall (inches) X 1.53 1.78 2.60 2.95 3.42
Discharge (1000 c.c) Y 33.5 36.3 40.0 45.8 53.5
Marks in Economics 25 28 35 32 31 36 29 38 34 32
Marks in Statistics 43 46 49 41 36 32 31 30 33 39
3. In a partially destroyed record of an analysis of correlation data, the following results are
legible. The two lines of regression are 8 X 10Y 66 0 and 40 X 18Y 214 0 . Find the
mean values of X and Y .
Solution: X 13 , Y 17 .
1 3 2
4. If r12 ; r23 ; r31 then find the value of R1.23 .
2 4 3
Solution: 0.5
5. For a trivariate distribution, the following correlation coefficient were obtained r12 0.77,
r13 0.72 and r23 0.52 . Find the partial correlation coefficient r12.3 and multiple correlation
coefficient R1.23
Solution: r12.3 0.6673 , R1.23 0.8561
6. The following are data on the number of twists required to break a certain kind of forged
alloy bar and the percentage of two alloying elements present in the metal.
No. of 41 49 69 65 40 50 58 57 31 36 44 57 19 31 33 43
twist (Y )
Percent of 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4
element A
( X1 )
Percent of 5 5 5 5 10 10 10 10 15 15 15 15 20 20 20 20
element B
(X2)
Meaning
Regression means returning or stepping back to the average value. In statistics, the term
Regression means simple the average relationship. We can predict or estimate the value of dependent variable
from the given related values of independent variable with the help of a Regression Technique.
The measure of Regression studies the nature of correlation ship to estimate the most probable values. It
establishes a functional relationship between the independent and dependent variables.
Definition
According to Blair “Regression is the measure of the average relationship between two or more variable
in terms of the original units of the data”
According to Taro Yamame “ One of the most frequently used technique in economics and business
research to find a relation between two or more variables that are related casually, is regression analysis.
According to Wallis and Robert “It is often more important to find out what the relation actually is, in
order to estimate or predict one variable and statistical technique appropriate in such a case is called regression
analysis.
These two techniques are directed towards a common purpose of establishing the degree and the direction
of relationship between two or more variables but the methods of doing so are different. The choice of one or the
other will depend on the purpose. In spite certain similarities between these two, but there are some basic
differences in the two approaches, which have been summarized below:
138
CORRELATION REGRESSION
1. Correlation, literally means related or 1. Regression literally means return to the normal,
sympathetic movements between variables which is true on account of the average of
2. There is a sort of interdependence, which is relationship.
mutual. 2. It establishes a functional relationship, which is
3. There is no cause and effect relation ship. It mathematical showing dependence of one
only shows the existence of some association in variable on the other.
the movement of variables. 3. It may have a cause and effect relationship.
4. It may be spurious correlation if the 4. It is a mathematical relationship, which should
sympathetic movement is on account of the be interpreted suitably.
influence of an out side variable which has no 5. It is an absolute measure of relationship.
relevance. 6. Besides verification it can also be used for
5. It is a relative measure showing association estimation and prediction. It tenders more
between variables. comprehensive information.
6. It is used only for testing and verification of the 7. It is very useful for further mathematical
relationship. It tenders only a limited treatment.
information.
7. It is not very useful for further mathematical
treatment.
METHODS OF REGRESSION ANALYSIS
There are two methods:
1. Graphic methods (Not included in the syllabus)
2. Algebraic method.
The algebraic methods for simple linear regression can be broadly divided in to the following,
A. Regression lines.
B. Regression Equations.
C. Regression coefficient.
A. REGRESSION LINES:
In the graphical jargon, a regression line is a straight line fitted to the data by the method of least squares.
It indicates the best probable mean value of one variable corresponding to the mean value of the other. Since a
regression line is the line of best fit, it cannot be used conversely therefore, there are always two regression
lines constructed for the relation ship between tow variables x and y. Thus one regression line shows regression
of x upon y and the other shows regression of y upon x.
When two variables have relationship, then we can draw a regression line. The regression line of x on y
gives the most probable vales of x for any given value of y. In the same manner the regression line of y on x
gives the most probable values of y for any given value of x. Thus there will be two regression lines in the case
of two variables.
REGRESSION EQUATIONS
Regression equation is an algebraic method. It is an algebraic expression of the regression line. It can be
classified in to regression equation, regression coefficients.
As there are two regression lines, there are two regression equations. For the two variables x and y, there
are two regression equations. They are regression equation of x on y and the regression equation of y on x.
I Regression equation of x on y
(X-X)=r (Y-Y)
Y
II Regression Equation of Y on X
Y
(Y-Y)=r (X-X)
139
Application of Regression Equations when all required values are given
ILLUSTRATION =01
From the following results, obtain the two-regression equation and estimate the yield of crops when the
rainfall is 29 cms and the rainfall when the yield is 600 kg.
Y X
Yield Rainfall
In Kg In cm
26.7
Mean 508.4
4.6
S.D 36.8
Co efficient of correlation between yield and rainfall=0.52
Solution:
To estimate the yield of crops, we have to use Y on X Regression Equation.
Y
(Y-Y)= r (X-X)
36.8
Y-508.4=0.52 (X-26.7)
4.6
Y
(X-X)=r (Y-Y)
4.6
X-26.7=0.52 (Y-508.4)
36.8
X-26.7=0.065 (Y-508.4)
X-26.7=0.065Y-33.046
X=0.065Y-33.046+26.7
X=0.065Y-6.346 R, Line
When Y=600 Kgs
X=0.065X600-6.346
=39-6.346
X=32.654
140
ILLUSTRATION =02
Find out the regression equation, showing the regression of capacity utilization on production from the
following data.
Production In lakh Average Standard Deviation
Units 35.6 10.5
Capacity Utilization
84.8 8.5
(in percentage)
Coefficient of correlation}=0.62
Estimate the production when the capacity utilization is 70%
SOLUTION; Let the production and capacity utilization be denoted by X and Y respectively. Then we are given;
(X-X)=r (Y-Y)
Y
10.5
(X-35.6)=0.62 (Y-84.8)
8.5
X=35.6=0.7658(Y-84.8)
X-35.6=0.7658Y-64.94
X=0.7658y—64.94+35.6
X=0.7658y-29.34 R.Line
When Y=70%
=0.7658X70-29.34
=53.606-29.34
X=24.266 lakh unit
ILLUSTRATION = 03
Karl Pearson’ coefficient of correlation between the ages of brother’s and sisters in a community was
found to be 0.8.
Average of the brother’s ages was 25 years and that of sister’s were 22years.Their standard deviations were 4
and 5 respectively.
Find a. The expected age of brother when the sister’s age is 12 years.
b. The expected age of sister when the brother’s age is 33 years.
Solution:
Brother Sister
X Y
Mean age 25 years 22years
Standard
Deviation 4 5
(X-X)= r (Y-Y)
Y
141
4
X-25=0.8 (Y-22)
5
X-25=0.64(Y-22)
X-25=0.64Y-14.08
X=0.64Y-14.08+25
X=0.64Y+10.92 R.Line
When Y=12
=0.64X12+10.92
X=18.6 years, brother’s age
To estimate the sister’s age, we have to use Y on X regression equation Y=? When X=33years
Y
(Y-Y)=r (X-X)
5
(Y-22)=0.8 (X-25)
4 Y=X-3 R.Line
Y-22=1.0 (X-25) When X=33
Y-22=1X-25 Y=33-3
Y=X-22+22 Y=30 years, sister’s age
ILLUSTARION=04
Give the following data, estimate
1. The value of Y when X=70
2. The value of X when Y=90
X-Series Y-Series
Mean 18 100
Standard deviation 14 20
Co-efficient of correlation 0.8
SOLUTION
II. X=? When Y=90
I .Y=? When X =70 use Y on X R. equation Use X on Y R. Equation
Y
(Y-Y)= r (X-X) (X-X)= r (Y-Y)
X Y
20
Y-100=0.8 (X-18) 14
14 X-18=0.8 (Y-100)
Y-100=1.143 (X-18) 20
Y-100=1.143X-20.574 X-18=0.56 (Y-100)
Y=1.143X-20.574+100 X-18=0.56Y-56
Y=1.143X+79.426 R.Line X=0.56Y-56+18
When X=70 X=0.56Y-38 R.Line
Y=1.143 X 79 + 79.426 When Y=90
Y=80.01+79.426 X=0.56 X 90-38
Y=159.436 =50.4-38
X=12.4
142
ILLUSTRATION=05
To study the relationship between expenditure on a accommodation (X) and expenditure on Food (Y), an
enquiry in to 50 families gave the following result;
SOLUTION
To estimate expenditure on food, we should use Y on X Regression Equation.
∑X 8500 ∑y 9600
Y
X= = =170, Y= =192
(Y-Y)=r (X-X)
n 50 n 50
20
(Y-192)=0.6 (X-170) when X=200
60 Y=0.1999 X 200 + 158
Y-192=0.1999(X-170) =39.98+158
Y-192=0.1999X-33.9999 Y=Rs.197.98
Y=0.1999X+158 R.L Rs.197.98 is required to be spent on food.
ILLUSTRATION=06
X-Series Y-Series
Mean 20 25
Variance 4 9
Coefficient of correlation =0.75
SOLUTION
Obtaining of two Regression lines
X on Y R. Equation Y on X R. Equation
X= Variance Y= Variance
= =
=2 =3
bxy=Regression coefficient on x on y bxy=Regression coefficient on Y on X
b=Regression coefficient b=Regression coefficient
X Y
bxy= r bxy= r
Y X
(X-X)=bxy (Y-Y) (Y-Y)=bxy (X-X)
2 3
X-20=0.75 (Y-25) Y-25=0.75 (X-20)
3 2
X-20=0.5 (Y-25) Y-25=1.125 (X-20)
X-20=0.5-12.5 Y-25=1.125-22.5
X=0.5-12.5+20 Y=1.125X-22.5+25
X=0.5+7.5 R.Line
ILLUSTRATION=07 Y=1.125+2.5 R.Line
143
ILLUSTRATION = 07
You are given the following data.
X-Sries Y-Series
Mean 47 96
Variance 64 81
SOLUTION
X on Y R.Equation Y on X R.Equation
X= Variance = 64 = 8 Y= Variance = 81 = 9
Y
bxy= r bxy= r
Y
X-X =bxy (Y-Y) (Y-Y) =bxy (X-X)
8 9
X-47=0.36 (Y-96) Y-96=0.36 (X-47)
9 8
X-45=0.3199 (Y-96) Y-96=0.405 (X-47)
X-47=0.3199Y-30.7199 Y-96=0.405X-19.035
X=0.3199Y-30.7199+47 Y=0.405X-19.035+96
X=0.3199Y+16.28 R.Line Y=0.405X+76.965 R.Line
When Y=88 When X=50
X=0.3199 X 88 + 16.28 Y=0.405 X 50 + 76.965
X=28.1512 + 16.28 =20.25 + 76.965
X= 44.4312 Y= 97.215
ILLUSTRATION=08
The following results for heights and weights of 100 men were calculated.
Coefficient of
Mean Standard Deviation
Correlation
Weights 150 lbs 20 lbs
0.60
Heights 68 ” 2.5 “
Find an estimate
1. The weight of a man whose height is 5’ (5’=60”)
2. Height of a man whose is 200 lbs
SOLUTION
Let X= Weight and Y = Height.
144
X on Y R Equation X on Y R Equation
(X-X)=bxy (Y-Y)
(Y-Y)=byx (X-X)
20
20
(X-150)= X 0.6 (Y-68)
(Y-68)= (X-150)
2.5
2.5
X-150=4.8 (Y-68)
Y-68=0.075 (X-150)
X-150= 4.8Y-326.4
Y-68= 0.075X-11.25
X= 4.8Y-326.4+150
Y= 0.075X-11.25+68
X=4.8Y-176.4 RL when Y=60 5
Y=0.075X+176.4 RL when X=200 lbs
X=4.8 X 600-176.4
Y=0.075 X 200 + 56.75
X=111.6”
Y =71.75 lbs
OR X =9’-3.6”
REGRESSION COEFFICIENTS
Regression coefficient is denoted by ‘b’. There are two regression equations and therefore
there are two regression coefficients also. Regression coefficients measure the changes in the series corresponding
to a unit change in the other series.
The Regression coefficient of X on Y
X
i.e bxy =r
Y
Give us the value by which X-variable changes for a unit change in the value of Y-variable.
These two coefficients measure the change in dependent variable corresponding to the unit
change in independent variable. They also help in direct calculation of coefficient of correlation.
Square – root of the product of two Regression coefficient gives us the value of correlation,
as under;
X Y
Bxy X box = ς Xr
Y X
∴r = bxy X byx
145
CALCULATION OF REGRESSION COEFFICENTS AND MAKING ESTIMATION OF UN-
KNOWN VALUE
INDIVIDUAL SERIES =
When actual data is given and deviation are
taken from assumed mean
ILUSTRATION =09
From the data given below find out;
a. Regression coefficients
b. Regression Equations
c. Estimate the age when B.P is 130
d. Estimate the B.P when age is 50 years
e. Find the coefficient of correlation through Regression coefficients.
Age 56 42 72 36 63 47 55 49 38 42 68 60
B.P 147 125 160 118 149 128 150 145 115 140 152 155
SOLUTION
Age X-47 B.P Y-128
D2x D2Y dxdy ∑dx
X dx Y dy
56 9 81 147 19 361 171 X=A+ X C
42 -5 25 125 -3 9 15 N
72 25 625 160 32 1024 800 64
36 -11 121 118 -10 100 110 =47+ X1
63 16 256 149 21 441 336 12
47 0 0 128 0 0 0 X=52.33
55 8 64 150 22 484 176 ∑dy
49 2 4 145 17 289 34 Y=A + XC
38 -9 81 115 -13 169 117 n
42 -5 25 140 12 144 -60 148
68 21 441 152 24 576 504 =128+ X1
60 13 169 155 27 729 351 12
N= 64 1892 N= 148 4326 2554 =128+12.33
12 ∑dx ∑d2x 12 ∑dy ∑d2y ∑dxdy Y= 140.33
146
X-52.33=0.7057Y-99.031 Y-140.33=1.138X-59.55
X=0.7057Y-99.031+52.33 Y=1.138X-59.55+140.33
X=0.7057Y-46.701 Y=1.138X-80.78
Estimation of age (X) when Estimation of B.P (Y) when
B.P(Y) is 130 Age(X) is 50 years
X=0.7057 X 130-46.701 Y=1.138 X 50-80.78
=91.741-46.701 =56.9-80.78
X=45.04 years Y=137.68
ILLUSTRATION=10
From the following data, obtain the two Regression Equations. Also calculate coefficient of
correlation based on regression coefficient.
Sales: X 91 97 108 121 67 124 51 73 111 57
Purchases: Y 71 75 69 97 70 91 39 61 80 47
SOLUTION
X-67 Y-70
X dx2 Y Dx2 dxdy
dx dy
91 24 576 71 1 1 24 X=A +∑dx X C
97 30 900 75 5 25 150
W
108 41 1681 69 -1 1 -41
121 54 2416 97 27 729 1458 =67+230 X 1
67 0 0 70 0 0 0 10
124 57 3249 91 21 441 1197 =90
51 -16 256 39 -31 961 496
73 6 36 61 -9 81 -54 Y= A + ∑dy X C
111 44 1936 80 10 100 440 N
57 -10 100 47 -23 529 230 =70 + 0 X 1
230 11150 0 2868 3900 10
∑dx ∑d2x ∑dy ∑d2x ∑dxdy Y = 70
X on y Regression on coefficients Y on X Regression on coefficients
X Y
Bxy =ς Bxy =ς
Y X
147
Regression Equation Regression Equation
ILLUSTRATION = 11
The following data related to the ages of husband and wives. Obtain the two Regression
equations and estimate the most likely age of husband for the age of wife 25 years.
Ages of husbands 25 28 30 32 35 36 38 39 42 55
Ages of wife’s 20 26 29 30 25 18 26 35 35 46
SOLUTION
X = A + ∑dx X C
x-36 2 Y-29 2 N
X Dx Y Dy dxdy
dx dy = 36 + 0 X 1
25 -4 121 20 -9 81 99 10
28 -8 64 26 -3 9 24
X =36
30 -6 36 29 0 0 0
32 -4 16 30 1 1 -4
Y = A + ∑dy X C
35 -1 1 25 -4 16 4 N
36 0 0 18 -11 121 0 =29 + 0 X 1
38 2 4 26 -3 9 -6 10
39 3 9 35 6 36 18 Y = 29
42 6 36 35 6 36 36 X
55 19 361 46 17 289 323 Bxy = r R. coefficient.
0 648 0 598 494 Y
N=10
∑dx ∑d2x ∑dy ∑d2y ∑dxdy Y
Box = r R. coefficient
X
148
Regression Equation Regression Equation
ILLUSTRATION =12
A panel of two Judges P and Q graded dramatic performance by independently awarding marks as
follows.
Performance 1 2 3 4 5 6 7
Marks by ‘P’ 46 42 44 40 43 41 45
Marks by ‘Q’ 40 38 36 35 39 37 41
The eight performance which judge Q could not attend, was awarded 37 marks by judge P. If
Judge Q had also been present, how many marks could be expected to have been awarded by him to the eight
performances.
SOLUTION
Let the marks awarded by judge P be represented by X and those awarded by judge Q be Y. We
have to find out the value of Y when X=37. This can be done by finding out the regression equation Y on X.
Computation of Regression Equation Y on X
X-43 Y-38 ∑dx
X D2X Y D2Y dxdy X=A+ X C
Dx dy
46 3 9 40 2 4 6 N
42 -1 1 38 0 0 0
44 1 1 36 -2 4 -2 =43+ 0 X 1
40 -3 9 35 -3 9 9 7
43 0 0 39 1 1 0 X=43
41 -2 4 37 -1 1 2
45 2 4 41 3 9 6 Y=A + ∑dy X C
0 28 0 28 21 N
∑dx ∑d2X ∑dy ∑d2y ∑dxdy
=38 + 0 X 1
Regression Equation of Y on X
7
Y=38
Y- Y = bxy (X-X)
Y – 38 = bxy (X-43) X
∑dxdy X n – (∑dx X dy) 21 X 7 – 0 147 Bxy= r
∴bxy = ∑d2x X n – (∑dx)2 28 X 7 – 0 = 196 = 0.75 Y
Y – 38 = 0.75 (X-43)
Y-38 = 0.75X – 32.25
Y=0.75x +38 – 32.25
Y=0.75x + 5.75 R.Line
When X = 37
=0.75 X 37 + 5.75 Y=33.5
149
If judge Q was present, he would have awarded 33.5 marks.
Y
i.e box = r
X
∑fdxdy X N – (∑fdx X ∑fdy) c of y
box = X
∑fd2x X N – (∑fdx)2 c of x
ILLUSTRATION = 12
Following table gives the ages of husbands and wives for 50 newly married couples. Find the two regression
lines. Also estimate. A) The age of husband when wife is 20 and B) The age of wife when husband is 30.
Age of Husbands
Age of wives
20-25 25-30 30-35 Total
16-20 9 14 - 23
20-24 6 11 3 20
24-28 - - 7 7
Total 15 25 10 50
SOLUTION
Class interval for age of husband x is = 5
Class interval for age of wife (Y) is =4
X – 27.5
Dx = 5
Y – 22
dy = 4
150
A=27.5
X 20-25 25-30 30-35 Total
C=5
A=22
22.5 27.5 32.5
C=4
dx
Y MV -1 0 1 f fdy fd2y fdxdy
dy
9
16-20 18 -1 9 14 - 23 -23 23 9
20-24 22 0 6 11 3 20 0 0 0
7
24-28 26 1 - - 7 7 7 7 7
50 -16 30
Total F 15 25 10 16
N ∑fsy ∑fd2y
-5
fdx -15 0 10
∑fdx
25
Fd2x 15 0 10
∑fd2x
fdxdy 9 0 7 16
X on Y R.E Y on X R.E
151
ILLUSTRATION –14
The following are the marks obtained by 132 students in Test X and Test Y. calculate a) The Regression
Coefficient
b) Two Regression Equations
c) Coefficient of correlation
X
30-40 40-50 50-60 60-70 70-80 Total
Y
20-30 2 5 3 - - 10
30-40 1 8 12 6 - 27
40-50 - 5 22 14 1 42
50-60 - 2 16 9 2 29
60-70 - 1 8 6 1 16
70-80 - 2 4 2 8
Total 3 21 63 39 6 132
SOLUTION
A=55 X 30-40 40-50 50-60 60-70 70-80 Total
c=10
A=45 35 45 55 65 75
C=10
dx
Y MV -2 -1 0 1 2 f fdy Fd2y fdxdy
dy
8 10
20-30 25 -2 2 5 3 - - 10 -20 40 18
2 8 -6
30-40 35 -1 1 8 12 6 - 27 -27 27 4
0 0 0
40-50 45 0 - 5 22 14 1 42 0 0 0
-2 9 4
50-60 55 1 - 2 16 9 2 29 29 29 11
2 12 4
60-70 65 2 - 1 8 6 1 16 32 64 14
12 1
70-80 75 3 - - 2 4 2 2 8 24 72 24
132 38 232
Total F 3 21 63 39 6 71
n ∑fdy ∑fd2y
24
Fdx -6 -21 0 39 12
∑fdx
96
Fd2x 12 21 0 39 24
∑fd2x
fdxdy 10 14 0 27 20 71
∑fdx ∑fdy
X=A+ XC Y=A+ XC
N N
=55 + 24 X 10 =45 + 38 X 10
132 132
=55 + 240 =45 + 380
132 132
=55 + 1.82 X = 56.82 =45 + 2.878 = 47.878
152
Regression on Coefficient of X on Y Regression on Coefficient of Y on X
∑fdxdy X N – (∑fdx X ∑fdy) C of X ∑fdxdy X N – (∑fdx X ∑fdy) C of Y
bxy = X byx = X
∑fd2y X N – (∑fdy)2 C of Y ∑fd2x X N – (∑fdx)2 C of X
= 71 X 132 – (24 X 38) 10 = 71 X 132 – (24 X 38) 10
232 X 132 – (38)2 10 96 X 132 – (24)2 10
= 9372 – 912 = 8460 = 8460 = 8460
30624 – 1444 29180 =0.289 12672 – 576 12096 =0.699
R. Equation R. Equation
X-X=bxy (Y-Y) Y-Y=bxy (X-X)
X-56.82 = 0.289 (Y-47.88) Y-47.88 = 0.699 (X-56.82)
X-56.82=0.29Y – 13.8852 Y-47.88=0.7x– 39.774
X=0.29Y – 13.8852 + 56.82 Y=47.88=0.7x-39.774
X=0.29Y + 42.93 R.Line Y=0.7x + 8.11 R.Line
ILLUSTRATION = 15
∑fdx ∑fdy
X=A XC Y= A XC
N N
- 43 - 59
=62.5 + X 5 =115 + X 10
100 100
= 62.5 – 215 = 115 - 590
100 100
= 60.35 Y = 109.1
153
Tot
Y 90-100 100-110 110-120 120-130
al
95 105 115 125
dy
X MV -2 -1 0 1 f fdx fd2x fdxdy
dx
16 14 -4
50-55 52.5 -2 4 7 5 2 18 -36 72 26
12 10 -4 1
55-60 57.5 -1 6 10 7 4 27 -27 27
8
0 0 0
60-65 62.5 0 6 12 10 7 35 0 0 0
-6 -8 3
65-70 67.5 1 3 8 6 3 20 20 20 -11
100 -43 119
Total f 19 37 28 16 33
N ∑fdx ∑fd2x
fdxy -38 -37 0 16 -59 ∑fdy
12 ∑fd2
fd2y 76 37 0 16 ∑fdxdy
9 y
fdxdy 22 16 0 -5 33
X Y
bxy = r byx = r
Y X
∑fdxdy X N – (∑fdx X ∑fdy) Cof x ∑fdxdy X N – (∑fdx X ∑fdy) Cof y
∴bxy = X ∴byx = X
∑fd2y X N – (∑fdy)2 Cof y ∑fd2x X N – (∑fdx)2 Cof x
=33 X 100 –(-43 X 59) 5 =33 X 100 –(-43 X 59) 10
2
129 X 100 – (59)2 10 119 X 100 – (-43) 5
3300 - 2537 3300 + 2537 2
= X 0.5 = X
12900 – 3481 11900 – 1849 1
= 763 X 0.5 = 381.5 = 763 X 2 =0.15
9419 1 9419 = 0.0405 10051 byx =01518
R. Equation R. Equation
154
ILLUSTRATION = 16
From the following data find:
a) The most probable value of Y, when X is 60 and
b) The most probable value of X, when Y is 40 and
c) The coefficient of correlation
X =53.2, Y=27.9, byx -1.5 and bxy = - 0.2
SOLUTION
X on Y R.Equation Y on X R.Equation
PRACTICAL PROBLEMS
6. Given the following data, calculate,
a. The expected value of Y when X=60
b. The expected value of X when Y=120
X Y
Mean 65 120
SD 5 10
PROBLEM = 07
Given the following data estimate the marks in Mathematics for a student who has secured 60 marks in English.
Arithmetic Average of Marks in Maths = 80
Arithmetic Average of Marks in English = 50
SD of Marks in Mathematics _ _ _ _ _ _ _ 15
SD of Marks in English _ _ _ _ _ _ _ _ _ _ 10
Coefficient of Correlation _ _ _ _ _ _ _ _ _ _ 0.4
[Answer : 86]
155
PROBLEM = 08
Find the most likely Price in Bangalore corresponding to the price ofRs.70 at Mysore from the following
data
Average price at Mysore = Rs.65
Average price at Bangalore = Rs.67
SD of Price at Mysore = Rs.2.5
SD of Price at Bangalore = Rs.3.5
Coefficient of correlation between the two prices of the commodity in the two cities is 0.8.
Also estimate the price at Mysore Corresponding to the price Rs.50 at Bangalore.
[Answer: 72.6 and 55.3]
PROBLEM = 09
You are given the following data.
X Y
Mean 36 85
S. D. 11 8
PROBLEM NO: 12
Form the data given below, find
a. The two regression equations
b. The Coefficient of Correlation between the marks in Economics and Statistics.
c. The most likely marks in Statistics when marks in Economics are 30.
Marks in Economics X 25 28 35 32 31 36 39 38 34 32
Marks in Statistics Y 43 46 49 41 36 32 31 30 33 39
PROBLEM = 14
The following table shows the frequency distribution of couples classified according to the ages.
Calculate,
a) Obtain two Regression coefficients.
b) Estimate the age of husband when wife’s age is 28 years.
156
c) Calculate coefficient of correlation.
Wife’s age Husbands age in years X
In years Y 20-25 25-30 30-35 35-40 TOTAL
15-20 20 10 3 2 35
20-25 4 18 6 4 32
25-30 - 5 11 - 16
30-35 - - 2 - 2
35-40 - - - 5 5
TOTAL 24 33 22 11 90
[ Answers, r = 0.612, X = 22.5, Y = 28.6, b = 31.7 , box = 0.558 ]
PROBLEM = 15
From the following data,
a) Estimate X when Y = 30 and also b) Estimate Y when X = 20
X
5-15 15-25 25-35 35-45 TOTAL
Y
0-10 1 1 - - 2
10-20 3 6 5 1 15
20-30 1 8 9 2 20
30-40 - 3 9 3 15
40-50 - - 4 4 8
TOTAL 5 18 27 10 60
[Answer a) 28.7 b)22.31]
PROBLEM NO =16
From the following data, calculate
a) Regression coefficients b) Coefficient of correlation based on bxy and box.
Y
30-35 35-40 40-45 45-50 TOTAL
X
25-30 20 10 3 2 35
30-35 4 28 6 4 42
35-40 - 5 11 - 16
40-45 - - 2 - 2
45-50 - - - 5 5
TOTAL 24 43 22 11 100
[Answer: X = 32.5, Y = 38.5 bxy = 0.6744 box = 0.5576, ς = 0.6132]
PROBLEM = 17
Calculate two Regression Coefficients. Estimate the value of X when Y = 49 also calculate
coefficient of correlation based on bxy and box.
X 43 44 46 40 44 42 45 42 38 40 42 57
Y 29 31 19 18 19 27 27 29 41 30 26 10
[Answer X = 64.8, Y = ? , bxy = -0.44, byx = -1.2198, ς= -0.732]
PROBLEM = 18
From the following bivariate table calculate the following
a) Two Regression coefficients
b) Coefficient of correlation based on bxy and box
X
59.9 79.5 99.5 119.5 139.5 159.5 179.5 TOTAL
Y
2.25 3 4 3 6 2 1 1 20
7.25 2 3 5 10 3 1 1 25
12.25 5 4 6 11 5 3 3 37
17.25 10 11 12 15 12 15 10 85
22.25 4 2 3 10 7 5 6 37
27.25 1 1 2 8 8 5 4 29
32.25 1 1 1 10 5 4 5 27
TOTAL 26 26 32 70 42 34 30 260
157
[Answer: X = 17.80, Y = 122.42, bxy = 0.05, box = 1.06, r = 0.230]
158
School of Distance Education
Here the normal equations are, 47.14 = 90 B + 20 A --- (1)
y 101.3 (1.196) x
To obtain the regression line Y on X of the form y ax b for the given data ( x1 , y1 ) ,
( x2 , y2 ) ,…, ( xn , yn ) , the following normal equations for fitting y ax b are to be solved.
n n n
xi yi a
i 1
xi 2 b
i 1
x
i 1
i (1) and
n n
y
i 1
i a x
i 1
i n b (2)
Let us transform x and y to X and Y as, X x x and Y y y ; where x and y are the
means of x and y respectively. Now the normal equations for fitting a straight line
connecting X and Y in the form Y aX b are:
X Y
i 1
i i a X
i 1
i
2
b X
i 1
i (3) and
n n
Yi a
i 1
X
i 1
i n b (4)
n n n n
But here, X i ( xi x ) 0 and
i 1 i 1
Yi ( yi y ) 0
i 1 i 1
Hence,
n n
(3) X i Yi a
i 1
Xi 1
i
2
b 0
n n
1 n
X i Yi xi x yi y xi x yi y
n i 1
a i 1
n
i 1
n
1 n
X x x xi x
2 2 2
i i
i 1 i 1 n i 1
Cov( x, y )
That is a
var( x)
(4) 0 a0 n b b 0 .
Cov( x, y )
Then, the straight line is, Y X 0.
var( x)
Cov( x, y )
Hence the regression line y on x is, y y x x.
var( x)
Cov( x, y )
x x y y
var( y )
Cov( x, y ) P
In the regression line y on x, the coefficient of x, xy2 is known as the
var( x) x
regression coefficient of y on x, denoted by byx and in the regression line x on y, the
Cov( x, y ) P
coefficient of y, xy2 is known as the regression coefficient of x on y, denoted by
var( y ) y
bxy .
The regression line y on x help us to predict the value of y for a given value of x,
and the regression line x on y helps to predict the value of x for a given value of y.
Age x : 66 38 56 42 72 36 63 47 55 45
BP : 145 124 147 125 160 118 149 128 150 124
Solution:
Px , y
y y x x , where Px , y = cov(X,Y), x 2 = V(X).
x 2
Using the given data to find mean of x, mean of y, cov(X,Y) and V(X).
x y x2 xy
520 1370
Mean of X = 52 , Mean of Y = 137
10 10
1 72765
Cov (X,Y)
n
xy x y
10
52 137 152.5
152.5
y 137 x 52 y 1.1148 x 79.03
136.8
Then the blood pressure of a man whose age x = 55 can be get by substituting x =
55 in the derived regression equation y on x, This implies, the blood pressure,
x 130 , y 200 , x 2
2288 , y 2
5506 , xy 3467
Obtain regression line of Y on X. Find y when x = 16.
Solution:
Px , y
The regression line y on x is, y y x x , where Px , y = cov(X,Y), x 2 = V(X)
x2
1
Cov( X , Y )
n
xy x y
3467 = 86.7
1 130 200
10 10 10
2
1 1 130
V ( X ) x 2 x 2 2288 = 59.8
n 10 10
y 1.4498 x 1.1526 .
If there is a linear relation between the variables x and y, the degree of linear
relation is measured by the coefficient of correlation. If all they given ( xi , yi ) points are
almost satisfying a linear relation, then we are saying that there is a high degree of linear
relation between the variables. If the linear relation fitted for the variables is in such a
Applied Statistics Page 32
School of Distance Education
way that the increment in one variable results in the increment of the other also, then
there is a direct (or positive) correlation existing between the variables. On the other
hand, if the linear relation fitted for the variables is in such a way that the increment in
one variable results in the decrease of the other, and then there is an inverse (or negative)
correlation existing between the variables. If there is no linear relation existing between
the variables, the correlation is zero.
1 n 1 n
Pxy ( xi x )( yi y )
n i 1
xi yi xy
n i 1
rxy
x y 1 n 1 n
1 n 2 1 n 2
( xi x )2 n
n i 1 i 1
( yi y ) 2 i
n i 1
x ( x ) 2
yi ( y )2
n i 1
Theorem: For two variable x and y, 1 rxy 1 , where rxy is the Pearson’s coefficient of
correlation.
Proof:
( xi x )
Let ( x1 , y1 ) , ( x2 , y2 ) ,…, ( xn , yn ) are the observations on x and y. Consider and
x
( yi y )
, where x and y are the means and x and y are the standard deviations of x
y
and y respectively.
2
( x x ) ( yi y )
We have, i 0 , because it is the square of a real number.
x y
1 ( xi x ) 2 1 ( yi y ) 2 1 ( xi x ) ( yi y )
On expansion,
n i x
2 0
n i y x y
2 2
n i
1 1 1 1 1 1
x n i
2
( xi x ) 2 2 ( yi y ) 2 2
y n i
( xi x )( yi y ) 0
x y n i
x2 y
2
Cov( x, y ) Pxy
2 0 . That is, 1 1 2 0
x y
2 2
x y x y
1 rxy 1
Pxy
Remark: We have the regression coefficients y on x, byx and the regression
x2
Pxy
coefficients x on y, bxy . The geometric mean of these regression coefficients gives
y2
the magnitude of the coefficient of correlation rxy . The sign of correlation is determined by
the sign of covariance between x and y, Pxy . If Pxy is positive rxy is positive in sign and if
Pxy is negative rxy is negative in sign.
Proof:
1 n
( xi x )( yi y )
n i 1
Then, rxy
1 n 1 n
n i 1
( xi x ) 2
n i 1
( yi y ) 2
x A yB
Let, u and v ;
c d
1 n
(ui u )(vi v )
n i 1
ruv
1 n 1 n
n i 1
(ui u ) 2
n i 1
(vi v ) 2
1 n xi A x A yi B y B
n i 1 c
c d
d
ruv
2 2
1 n xi A x A 1 n yi B y B
n i 1 c
c
n i 1 d
d
1 n xi x yi y
n i 1 c d
ruv
1 n xi x 1 n yi y
2 2
n i 1 c
n i 1 d
1 1 n
xi x yi y
cd n i 1
ruv
1 1 n 1 n
i yi y
2 2
x x
cd n i 1 n i 1
1
Pxy Pxy
ruv cd
1
x y x y
cd
ruv rxy .
Problem: Find the coefficient of correlation for the following data on X and Y.
X: 65 66 67 67 68 69 70 72
Y: 67 68 65 68 72 72 69 71
Solution:
Pxy
Coefficient of correlation, rxy
x y
1 n 1 n 2 1 n
Pxy xi yi xy ;
n i 1
x2 =
n i 1
xi ( x ) 2 and y2 = yi 2 ( y ) 2
n i 1
x y x2 y2 xy
1 1 1 1
x
n
x i 544 = 68 ;
8
y
n
y i 552 = 69
8
1 n 1
Pxy
n i 1
xi yi xy = 37560 68 69 3
8
1 n 2 1
x2 = xi ( x ) 2 = 37028 68 2 4.5
n i 1 8
1 n 2 1
y2 = yi ( y ) 2 = 38132 69 2 5.5
n i 1 8
Pxy 3
Coefficient of correlation, rxy 0.603 .
x y 4.5 5.5
Problem: Calculate Karl Pearson’s coefficient of correlation for the following data;
x: 10 12 13 16 17 20 25
y: 19 22 26 27 29 33 37
Solution:
Cov( X , Y )
Coefficient of correlation r
S .D.( X ) S .D.(Y )
The problem can be solved by simply following the steps shown in above example.
But for some computational easiness the problem can also be solved as in the following
illustration.
Cov(U ,V )
Correlation between U and V, r
S .D.(U ) S .D.(V )
x y U = X – 16 V = Y – 27 U2 V2 UV
10 19 -6 -8 36 64 48
12 22 -4 -5 16 25 20
13 26 -3 -1 9 1 3
16 27 0 0 0 0 0
17 29 1 2 1 4 2
20 33 4 6 16 36 24
25 37 9 10 81 100 90
26.628
Now, Correlation between U and V, r = 0.98
22.69 32.533
Pxy Pxy y
Since rxy , the regression coefficient y on x, rxy and
x y 2
x x
Pxy x
The regression coefficient x on y, rxy .
2
y y
y
Hence the regression equations are, y y rxy x x ---- (1) and
x
x
x x rxy y y ---- (2)
y
y
The regression equation x on y can be rewrite as y y x x ---- (3)
rxy x
Now the regression equation y on x [equation (1)] and that on x on y [equation (3)] can be
written in the form y = m x + c as follows:
y y
y rxy x rxy x y ---- (1) and
x x
y y
y x x y ---- (3)
rxy yx rxy yx
y y
From here, we get the slopes of these two regression lines as, m1 rxy and m2
x rxy x
y y
rxy
m1 m2 x rxy x
tan
1 m1m2
1 rxy y y
x rxy x
rxy 2 y y
rxy x rxy 2 y y x2
y 2 rxy x x2 y2
1 2
x
Applied Statistics Page 38
School of Distance Education
rxy 2 1 y 2
tan 2 x 2
rxy x x y
rxy 2 1 x y
tan
rxy x2 y2
Remarks:
(i) For two variables x and y, if rxy 1 , we get tan 0 . This implies the angle between
the regression lines tan 1 0 0 . That is, if there is a perfect linear relation exists
between x and y (whether it is direct or inverse), the angle between the regression line is
zero. Or in other words, the two regression lines coincide or they are same.
(ii) If rxy 0 , we get tan . This implies the angle between the regression lines
tan 1 900 . That is, if there is no linear relation exists between x and y, the two
regression lines are perpendicular.
If there are two regression lines, it is obvious that they are intersecting at a point. The
point of intersection of regression lines can be obtained by solving the regression
equations for x an y. It can be done as follows:
y
We have regression equation y on x; y y rxy x x --- (1) and the regression
x
x
equation x on y; x x rxy y y ---- (2)
y
y x
Put (2) in (1) gives, y y rxy rxy y y
x y
y y rxy 2 y y
1 r y 1 r y
xy
2
xy
2
y y
Put y y in (2) x x 0 x x
a1 b2
Hence, if byxbxy 1 , we can confirm that our assumption regarding the
b1 a2
regression lines are same. Otherwise the first line is the regression line x on y and the
b
second is the regression line y on x. Then the regression coefficients are bxy 1 and
a1
a2 a2 b1
byx . Then the coefficient of correlation, rxy , which is the reciprocal of rxy ,
b2 b2 a1
obtained by previous assumption.
Solution:
Solving the given two regression lines,
5 x 6 y 90 0 ----- (1) and 15 x 8 y 130 0 ----- (2) , we get x , y .
(2) 3 (1) 10 y 400 y 40.
y 40, in (1) 5 x 6 40 90 0 x 30.
x , y 30, 40 .
Assume the first line is the regression line Y on X, then, the line can be expressed
5 90 5 a1
as, y x . This implies the regression coefficient Y on X . The second
6 6 6 b1
a1b2 a b 5 8
Then, = 1 2 0.444 1
a2b1 b1 a2 6 15
Solution:
14 3
Assume the 14 x 12 y 3 0 is the regression line Y on X, then, y x .
12 2
14 a1
This implies the regression coefficient Y on X .
The line
12 b1
21 10
12 x 21 y 10 0 is assumed as the regression line X on Y, then, x y . Then
12 12
21 b2
the regression coefficient X on Y .
12 a2
a1b2 a b
Then, 1 2
a2b1 b1 a2
14 21
= = 2.04 > 1. Hence our assumptions about the
12 12
regression lines are NOT true.
12 10 12 a1
Then, y x , and regression coefficient Y on X .
21 21 21 b1
12 3 12 b2
And, x x , the regression coefficient X on Y .
14 14 14 a2
a1b2 12 12
Then,, = = 0.4898.
a2b1 21 14
Since the regression coefficients are negative, the correlation coefficient is (- 0.4898).
Problem: The regression lines are y ax b and x cy d . If the two variables are having the
same mean, show that d (1 a ) b(1 c ) .
Solution:
The means of x and y are obtained by solving the regression lines for x and y.
1 d
Here the first line is y ax b --(1) and the second is x cy d --(2) that is y x --(3)
c c
1 d
(3) and (1) ax b x
c c
d 1 bc d
x b / a
c c 1 ac
bc d ad b
(1) y a b
1 ac 1 ac
bc d ad b
This implies, x and y .
1 ac 1 ac
bc d ad b
If the means of the variables are equal, we can write,
1 ac 1 ac
This gives, bc d ad b d ad b bc
1 a d b 1 c .
Problem: If the variables x and y are satisfying the relation ax by c 0 . Show that the
correlation between x and y is -1 or +1, according as a and b are of the same sign or not.
Solution:
Since the variables are satisfying the relation ax by c 0 , we can write this
a c
relation in the line of the form y on x as, y x ; and in the line of the form x on y as,
b b
b c a
x y . Then the regression coefficients y on x and x on y are identified as , and
a a b
b
respectively. Then the magnitude of the coefficient of correlation is obtained by the
a
a b
geometric mean of the regression coefficients as, 1 . Then the correlation
b a
coefficient can be +1 or -1 according as the regression coefficients are positive or negative.
When we are considering two characteristics which are qualitative in nature, they are
not possible to measure numerically. For example consider the characteristics of the
ability in drawing (let it be X) and the ability in music (let it be Y). It is not possible to
measure numerically the values of X and Y, for an individual. But if there are n
individuals, it is possible to rank these n individuals according to the ability in drawing
(X) and according to their ability in music (Y). If these two characteristics are having high
positive correlation, then ranks obtained for the individuals based of X and Y will be in
same order. If these two characteristics are having high negative correlation, then ranks
obtained for the individuals based of X and Y will be in reverse order. Using the ranks
obtained for the n individuals based on the characteristics X and Y, a method of finding
the coefficient of correlation is derived by C.Spearman in 1904. The coefficient of
correlation for two characteristics which are calculated based on the ranks is known as
Spearman’s Rank Correlation Coefficient.
2
2
x
n
1 n(n 1)(2n 1) (n 1)
2
2
2
x
n 6
;
n 1
2
12
n2 1
Similarly, 2
y .
12
1 n 2 1 n
di d xi yi 0
2
2 2 2
d
n i 1 n i 1
1 n
xi yi
2
n i 1
1 n 2
di
n i 1
1 n 2 1 n 1 n
Since x y , we can re write d i as, di 2 xi x y yi
2
n i 1 n i 1 n i 1
2
1 n 2 1 n
di xi x yi y
n i 1 n i 1
2 2
1 n 2 1 n 1 n 1 n
di xi x yi y 2 xi x yi y
n i 1 n i 1 n i 1 n i 1
1 n 2
di x 2 y 2 2 cov( x, y)
n i 1
n2 1
Since, x2 y2 ,
12
1 n 2 n2 1 n2 1 n2 1 n2 1
we get, di 2r
n i 1 12 12 12 12
1 n 2 n2 1 n2 1
i
n i 1
d 2
12
2 r
12
1 n 2 n2 1
i
n i 1
d 1 r
6
Problem: The following are the ranks obtained by 10 students in Statistics and Mathematics
Statistics: 1 2 3 4 5 6 7 8 9 10
Mathematics: 1 4 2 5 3 9 7 10 6 8
To what extent is the knowledge of students in the two subjects related?
Solution:
Here to find the rank correlation coefficient of the ranks in Statistics and
Mathematics. Rank correlation coefficient is defined as,
6 d i
2
r 1 i
, di is the difference in ranks.
n(n 2 1)
The calculations are:
1 1 0 0
2 4 -2 4
3 2 1 1
4 5 1 1
5 3 2 4
6 9 3 9
7 7 0 0
8 10 -2 4
9 6 3 9
10 8 2 4
36
6 d i
2
6 36
Hence, r 1 i
= 1 1 0.2189 0.7819
n(n 1)
2
10(102 1)
Problem: 10 competitors in a music test were ranked by three judges A, B, and C in following
order.
Applied Statistics Page 45
School of Distance Education
Ranks by A: 1 6 5 10 3 2 4 9 7 8
Ranks by B: 3 5 8 4 7 10 2 1 6 9
Ranks by C: 6 4 9 8 1 2 3 10 5 7
Discuss which pair of judges has the nearest approaches to common likings in music.
Solution:
Here to find the rank correlation coefficient between each pair of the judges
considering the ranks they given. Identify the pair of judges with high correlation
coefficient. They are considered having nearest approaches to common likings in music.
by A by B by C
xi yi zi
1 3 6 -2 -5 -3 4 25 9
6 5 4 1 2 1 1 4 1
5 8 9 -3 -4 -1 9 16 1
10 4 8 6 2 -4 36 4 16
3 7 1 -4 2 6 16 4 36
2 10 2 -8 0 8 64 0 64
4 2 3 2 1 -1 4 1 1
9 1 10 8 -1 -9 64 1 81
7 6 5 1 2 1 1 4 1
8 9 7 -1 1 2 1 1 4
200 60 214
6 d i
2
6 200
Rank correlation between A and B, r 1 i
= 1 0.212
n(n 1) 2
10(102 1)
6 d i
2
6 60
Rank correlation between A and C, r 1 i
= 1 0.6364
n(n 1) 2
10(102 1)
6 d i
2
6 214
Rank correlation between B and C, r 1 i
= 1 0.297
n(n 1) 2
10(102 1)
It can be observed that the judges A and C are having nearest approaches to
common likings in music.
Applied Statistics Page 46
School of Distance Education
Problem: Find the rank correlation coefficient for the following data:
X: 92 89 87 86 84 77 71 63 53 50
Y: 86 83 91 77 68 85 52 82 37 57
Solution:
First, the given values of X and Y should be ranked. If an observation repeats, then
the sum of the ranks is equally divided among the observations. (For eg., when we are
ranking the observations in order, and let a number, say a, coming in the 6th and 7th
position then the first and second a values are assigned with the rank 6.5).
Here the observations are ranked in descending order. Then find the rank
correlation coefficient.
x y Rank of X, xi Rank of Y, yi xi - yi xi yi
2
92 86 1 2 -1 1
89 83 2 4 -2 4
87 91 3 1 2 4
86 77 4 6 -2 4
84 68 5 7 -2 4
77 85 6 3 3 9
71 52 7 9 -2 4
63 82 8 5 3 9
53 37 9 10 -1 1
50 57 10 8 2 4
44
6 d i
2
n(n 1)
2
6 44
1 0.733
10(102 1)
It may be noted that the Spearman’s rank correlation formula is derived on the
assumption that all the ranks are different. But in practice, there are many situations,
where more than one individual are getting the same rank. In a competition consider,
three individuals received 3rd rank. They would have given the 3rd ,4th, and 5th rank, if
there were slight difference in the evaluation. Then we add 3,4 and 5, which is 12. Then
12 is equally divided for these three individuals. Hence we assign the rank 4 to each of
these three individual. In such situations it is more accurate to calculate the Pearson’s
coefficient of correlation between the ranks directly after assigning the average rank to
those with the same rank. But there is also a modified formula of Spearman’s rank
correlation coefficient, which is as follows:
n
6 di 2 mi mi 2 1 m j m j 2 1
1 1
times the i th rank repeats in the x series of ranks and m j is the number of times the j th rank
repeats in the y series of ranks when the average ranks are assigned. The method is
illustrated below:
X: 15 20 28 12 40 60 20 80
Y: 40 30 50 30 20 10 30 60
Illustration:
At first we assign ranks for X and Y values. Here we have 8 sets of data. That is
n=8.
X: 7 5.5 4 8 3 2 5.5 1
Y: 3 5 2 5 7 8 5 1
Here in X values, 20 repeats twice, with the possible ranks, 5 and 6. Hence its
average 5.5 is supplied for the value 20. Similarly in Y values, 30 repeat thrice, with
possible ranks 4, 5 and 6. Hence their average 5 is assigned as the ranks of the values 30.
Now the difference in ranks, di X i Yi values are:
di : 4 0.5 2 3 -4 -6 0.5 0
di 2 : 16 0.25 4 9 16 36 0.25 0
This gives, d i
i
2
81.50 .
n
6 di 2 mi mi 2 1 m j m j 2 1
1 1
Hence, r 1
i 1 12 i 12 j
n n 1
2
6 81.50 2 22 1 3 32 1
1 1
1
12 12
8 8 1
2
6 81.50 0.5 2
1 = 0.
8 63
In a statistical study, if there are many variables included, and whenever we are
interested in studying the joint effect of a group of variables upon a variable not included
in that group, our study is on multiple correlations and multiple regressions.
For eg., in the study on the yield of a crop per acre (let it be X 1 ), the value of the
variable X 1 , is a joint effect of the variables, quality of seed X 2 , fertility of soil X 3
,fertilizer used X 4 , irrigation facilities X 5 , whether conditions X 6 and so on.
If we are considering the relation between two variables only, there are two
alternatives;
(i) We consider only those two members of the observed data in which the
other members have specified values. Or,
(ii) We may eliminate mathematically the effect of other variables on the two
variables under consideration.
[The first method has the disadvantage that it limits the size of the data and also it will
applicable only the data in which the other variables have assigned values]
In second method it may not possible to eliminate the entire influence of the variables, but
the linear effect can easily eliminated. The correlation and regression between only two
variables eliminating the linear effects of other variables in considered is called the partial
correlation and partial regression.
Let the observations on X 1 , X 2 and X 3 are measured from their respective means, ie.,
X 1 x1i x1 , X 2 x2i x2 and X 3 x3i x3 .
The coefficients b12.3 and b13.2 are the partial regression coefficients of X 1 on X 2 and
that of X 1 on X 3 respectively.
e12.3 b12.3 X 2 b13.2 X 3 is called the estimate of X 1 as given by the equation of plane
of regression (2).
The quantity X 1.23 X 1 b12.3 X 2 b13.2 X 3 is called the error estimate or residual.
In the subscript of the residual X 1.23 , the subscript before ‘.’ ie., 1 is known as the
primary subscript and the other after the subscript, ie, 2 and 3 are called the secondary
subscripts.
From the equation of plane of regression given in (2), the y the constants b’s are
determined by the principle of least squares.
S
0 2 X 2 X 1 b12.3 X 2 b13.2 X 3 0
b12.3
S
0 2 X 3 X 1 b12.3 X 2 b13.2 X 3 0
b13.2
X X 0
2 1.23 and X X 0
3 1.23
X X 1 2 b12.3 X 12 b13.2 X 2 X 3 0
(3)
X X 1 3 b12.3 X X
2 3 b 13.2 X 3
2
0
1
Since X i ' s are measured from their respective means, we have, 12
N
X 1
2
,
1 cov X i , X j X X
cov( X i , X j )
N
X X i j and ri j
i j
i
N i j
j
.
1 r12 r13
r23 , and
If we write, r21 1
i j is the cofactor of the (i, j ) element of , then,
th
r31 r32 1
1 12
b12.3 and b13.2 1 13 . Now we get,
2 11 3 11
1 12
X1 X 2 1 13 X 3
2 11 3 11
(i) Sum of the product of any residual of order zero with any other residual of
higher order is zero, provided the subscript of the former occurs among the
secondary subscripts of the later.
(iii) The sum of the product of two residuals is zero, if all the subscript (primary
as well as secondary) of the one occur among the secondary subscripts of the
other. Eg., X 1.2 X 3.12 0 , X 2.3 X 1.23 0
cov( X 1 , e1.23 )
That is, R1.23 , which is derived as,
V ( X 1 )V (e1.23 )
The correlation coefficient between X 1 and X 2 after the linear effect of X 3 on each
of them has been eliminated is called partial correlation coefficient of X 1 and X 2 .
Similarly, X 2.3 X 2 b23 X 3 is the part of X 2 obtained after eliminating the linear
effect of X 3 .
r12 r13r23
This is derived as, r12.3 .
1 r 1 r
2
13
2
23
In a similar way the expressions for r13.2 and r23.1 can be obtained.
Solution:
r23 r21r31
(i) We have, r23.1
1 r 1 r
2
21
2
31
= 0.2425.
1 r12
r r13
(iii) b13.2 1 23
3 1 r23
r23 1