You are on page 1of 4

Correlation and Regression 1

Chapter

30
Correlation and Regression
n
1
Correlation or Cov (x , y ) 
n  (x y i i  x y) , where
Univariate and Bivariate distribution i 1
n n
1 1
“If it is proved true that in a large number of instances two
variables tend always to fluctuate in the same or in opposite directions,
x 
n x
i 1
i and y 
n y
i 1
i are means of variables x and y
we consider that the fact is established and that a relationship exists.
This relationship is called correlation.” respectively.
(1) Univariate distribution : These are the distributions in which Covariance is not affected by the change of origin, but it is
there is only one variable such as the heights of the students of a class. affected by the change of scale.
(2) Bivariate distribution : Distribution involving two discrete Correlation
variable is called a bivariate distribution. For example, the heights and The relationship between two variables such that a change in one
the weights of the students of a class in a school. variable results in a positive or negative change in the other variable is
(3) Bivariate frequency distribution : Let x and y be two known as correlation.
variables. Suppose x takes the values x 1 , x 2 ,....., x n and y takes (1) Types of correlation : (i) Perfect correlation : If the two
variables vary in such a manner that their ratio is always constant, then
the values y 1 , y 2 ,....., y n , then we record our observations in the correlation is said to be perfect.
the form of ordered pairs ( x 1 , y1 ) , where 1  i  n,1  j  n . (ii) Positive or direct correlation : If an increase or decrease in
one variable corresponds to an increase or decrease in the other, the
If a certain pair occurs fij times, we say that its frequency is fij . correlation is said to be positive.
The function which assigns the frequencies fij ’s to the pairs (iii) Negative or indirect correlation : If an increase or decrease
in one variable corresponds to a decrease or increase in the other, the
( x i , y j ) is known as a bivariate frequency distribution. correlation is said to be negative.
(2) Karl Pearson's coefficient of correlation : The correlation
Covariance coefficient r( x , y ) , between two variable x and y is given by,
Let ( x i , y i ); i  1, 2,....., n be a bivariate distribution, Cov ( x , y ) Cov ( x , y )
r( x , y )  or ,
where x 1 , x 2 ,....., x n are the values of variable x and Var ( x ) Var (y )  x y
y 1 , y 2 ,....., y n those of y. Then the covariance Cov (x, y)
between x and y is given by  n   n  n 
1
n n
  x iyi   
 
xi  
  yi 
 
Cov (x , y ) 
n  (x
i 1
i  x )(y i  y )
r( x , y )   i 1   i 1   i 1 
2 2
n  n  n  n 
n x
i 1
i
2

  xi 
 i 1 

n y
i 1
i
2

  yi 
 i 1 

2 Correlation and Regression
r  1 , it means that there is perfect correlation in the two
(x  x )(y  y )  dxdy characteristics i.e., every individual is getting the same ranks in the two
r   . characteristics.
( x  x )2 (y  y )2  dx 2  dy 2 r  1 , it means that if the rank of one characteristics is high,
(3) Modified formula : then that of the other is low or if the rank of one characteristics is low,

 dxdy   n
dx . dy then that of the other is high.
r  1 , it means that there is perfect negative correlation in
r the two characteristics i.e, an individual getting highest rank in one

  dx    dy   dy 
2 2 

characteristic is getting the lowest rank in the second characteristic.


 dx 2

n
 
 n
2


Here the rank, in the two characteristics in a group of n individuals are
of the type (1, n), (2, n  1),....., (n, 1) .
  
where dx  x  x ; dy  y  y r  0 , it means that no relation can be established between the
two characteristics.
Cov ( x , y ) Cov ( x , y )
Also, rxy   . Standard error and probable error
 x y Var ( x ).Var (y )
(1) Standard error of prediction : The deviation of the
(4) Step deviation method : Let A and B are assumed mean of predicted value from the observed value is known as the standard error
xi and y i respectively, then
prediction and is defined as S y 



 (y  y p)
2 

 .
1
r(x , y ) 

 u . v
uivi 
n
i i


n 

where y is actual value and y p is predicted value.
 u  n  u   v  n  v 
1 2 1 2 2 2
i i i i In relation to coefficient of correlation, it is given by
, (i) Standard error of estimate of x is S x   x 1  r 2
where u i  x i  A, v i  y i  B .
(ii) Standard error of estimate of y is S y   y 1  r 2 .
Rank correlation (2) Relation between probable error and standard error: If r
Let us suppose that a group of n individuals is arranged in order is the correlation coefficient in a sample of n pairs of observations,
of merit or proficiency in possession of two characteristics A and B. 1  r2
These rank in two characteristics will, in general, be different. then its standard error S.E. (r)  and probable error P.E.
n
For example, if we consider the relation between intelligence and
beauty, it is not necessary that a beautiful individual is intelligent also.  1  r2 
(r) = 0.6745 (S.E.)= 0.6745   . The probable error or the
Rank Correlation :  1
6 d 2
, which is the

 n 

n(n 2  1) standard error are used for interpreting the coefficient of correlation.
Spearman's formulae for rank correlation coefficient. (i) If r  P.E. (r) , there is no evidence of correlation.
Where d  2
= sum of the squares of the difference of two (ii) If r  6 P.E. (r) , the existence of correlation is certain.
ranks and n is the number of pairs of observations. The square of the coefficient of correlation for a bivariate
We always have, distribution is known as the “Coefficient of determination”.
d i   (x i  yi)  x i  y i  n( x )  n(y )  0
Regression
, (x  y )
Linear regression
If all d's are zero, then r  1 , which shows that there is perfect
rank correlation between the variable and which is maximum value of If a relation between two variates x and y exists, then the dots of
the scatter diagram will more or less be concentrated around a curve
r. If however some values of x i are equal, then the coefficient of which is called the curve of regression. If this curve be a straight line,
rank correlation is given by then it is known as line of regression and the regression is called
  linear regression.
 1 
6  d 2    (m 3  m ) Line of regression : The line of regression is the straight line
 12 
r 1    which in the least square sense gives the best fit to the given
2 frequency.
n(n  1)
Equations of lines of regression
where m is the number of times a particular x i is repeated.
Positive and Negative rank correlation coefficients (1) Regression line of y on x : If value of x is known, then value of
y can be found as
Let r be the rank correlation coefficient then, if r  0 , it means
that if the rank of one characteristic is high, then that of the other is
also high or if the rank of one characteristic is low, then that of the
other is also low.
Correlation and Regression 3

Cov ( x , y ) (2) If b yx  1 , then b xy  1 i.e., if one of the regression


yy (x  x ) or coefficient is greater than unity, the other will be less than unity.
 x2
(3) If the correlation between the variable is not perfect, then the
y regression lines intersect at ( x , y ) .
y y r (x  x )
x 1
(4) b yx is called the slope of regression line y on x and
(2) Regression line of x on y : It estimates x for the given value b xy
of y as
is called the slope of regression line x on y.
Cov ( x , y )  or b yx  b xy  2r
x x  (y  y ) or x  x  r x (y  y ) (5) b yx  b xy  2 b yx b xy i.e., the
 y2 y arithmetic mean of the regression coefficients is greater than the
correlation coefficient.
(3) Regression coefficient : (i) Regression coefficient of y on x
(6) Regression coefficients are independent of change of origin
r y Cov (x , y ) but not of scale.
is b yx  
x  x2  y2
(ii) Regression coefficient of x on y is (7) The product of lines of regression’s gradients is given by .
 x2
r x Cov (x , y )
b xy   . (8) If both the lines of regression coincide, then correlation will
y  y2 be perfect linear.
Angle between two lines of regression (9) If both b yx and b xy are positive, the r will be positive and if

Equation of the two lines of regression are both b yx and b xy are negative, the r will be negative.
y  y  b yx (x  x ) and x  x  b xy (y  y ) .
We have, m 1  slope of the line of regression of y on x
y
 byx  r.
x
1 y  Correlation is a pure number and hence unitless.
m 2  Slope of line of regression of x on y   .
b xy r.  x  Correlation coefficient is not affected by change of origin and
scale.
y r y  If two variate are connected by the linear relation
 x  y  K , then x, y are in perfect indirect correlation. Here
m 2  m1 r x x
 tan    
r y y r  1 .
1  m 1m 2
1 .  If x, y are two independent variables, then
x r x  x2   y2
 (x  y, x  y )  .
( y  r 2 y ) x (1  r 2 ) x y  x2   y2
  .
r x2  r y2 r( x2   y2 )  If regression lines are y  ax  b and x  cy  d , then

Here the positive sign gives the acute angle , because r 2  1 bc  d ad  b


x  and y  .
1  ac 1  ac
and  x ,  y are positive.
1
1  r 2  x y  If byx, bxy and r  0 then (b xy  b yx )  r and if bxy, byx
 tan  
r
. 2
 x   y2
.....(i) 2
1
and r  0 then (b xy  byx )  r .
 2
If r  0 , from (i) we conclude tan    or   i.e.,
 Correlation measures the relationship between variables while
2
two regression lines are at right angles. regression measures only the cause and effect of relationship
between the variables.
If r  1 , tan   0 i.e.,   0 , since  is acute i.e.,
two regression lines coincide.
 If line of regression of y on x makes an angle  , with the
positive direction of X-axis, then tan   byx .
Important points about regression coefficients
 If line of regression of x on y makes an angle  , with the
bxy and byx
positive direction of X-axis, then cot   b xy .
(1) r  b yx .b xy i.e ., the coefficient of correlation is the
geometric mean of the coefficient of regression.
4 Correlation and Regression

Correlation
1. If Var (x) = 8.25, Var (y) = 33.96 and Cov (x, y) = 10.2, then
the correlation coefficient is
(a) 0.89 (b) – 0.98
(c) 0.61 (d) – 0.16
2. When the origin is changed, then the coefficient of correlation
(a) Becomes zero (b) Varies
(c) Remains fixed (d) None of these
3. If r is the correlation coefficient between two variables, then
[MP PET 1995]
(a) r 1 (b) r 1
(c) | r |  1 (d) | r |  1
4. If r is the coefficient of correlation and Y  a  bX , then
| r|

a b
(a) (b)
b a
(c) 1 (d) None of these
5. Karl Pearson's coefficient of correlation between the heights (in
inches) of teachers and students corresponding to the given data
is [MP PET 1993]
Height of teachers x : 66 67 68 69 70
Height of students y : 68 66 69 72 70
1
(a) (b) 2
2
1
(c)  (d) 0
2

You might also like